On Rare Occasions, Sun Fire 12K/15K/20K/25K Servers May Experience an Unexpected "Dstop" During a Recovery POST



Category :Availability
Release Phase :Resolved
Product :Sun Fire 12K Server
Sun Fire E20K Server
Sun Fire 15K Server
Sun Fire E25K Server  
Bug Id :4943893  
Date of Resolved Release :02-JUN-2004 


Impact

A domain on Sun Fire 12K/15K/20K/25K servers may experience a Domain Stop (Dstop) during a recovery POST (hpost(1M) -Q). This issue occurs after a Solaris domain has reset and SMS attempts to recover the domain. Only the domain that has reset is affected, causing recovery to take longer (as POST then runs without "-Q" (quick option)).


Contributing Factors

This issue can occur in the following releases:

SPARC Platform

  • Sun Fire 12K/15K with SMS 1.2
  • Sun Fire 12K/15K with SMS 1.3 without patch 114608-09
  • Sun Fire 12K/15K with SMS 1.4
  • Sun Fire 12K/15K/20K/25K with SMS 1.4.1 without patch 117371-02

Symptoms

Should the described issue occur, the POST log will show the Dstop and an "xcstate" file will be generated under the domain dump directory, similar to the following example:

    DSTOP Detected for Slot SB17
    No valid errors in any SDI master stop1 status in this dump.
                Proceeding using local first-error status.
    SDI EX17/S0  Master_Stop_Status0[31:0] = 10000047
                MStop0[0]: All SDI logic is DStopped
                MStop0[1]: Slot 0 port is DStopped
                MStop0[2]: Slot 1 port is DStopped
                MStop0[6]: L1 Slot0 error line detected
    SDI EX17/S0  Dstop0[31:0] = 02008200
                Dstop0[25]: D 1E AXQ requests all Dstop (M)
    AXQ EX17 (17) Error_Flag_06[31:0] = 00800080  Mask = 7E00FFFF
                Err6[23]: D    Attempt to deallocate a deallocated Home 
    WATransID
    AXQ EX17 (17) Error_Flag_07[31:0] = 08000800  Mask = 63FF7D24
                Err7[27]: D    Home Agent counter underflow
    FAIL EXB EX17: SDI EX17/S0 detected Error request from AXQ, no reason found.
    Primary service FRU is EXB EX17.
    FAIL ENTIRE SYSTEM: FAIL & system_red_any_fail (-Q) set
    There is no FRU service action indicated for this failure.
    System state dumped to

    /var/opt/SUNWSMS/SMS1.3/adm/A/dump/xcstate.031016.1451.30
    Boards in dump: master SC    CPs/CSBs[1:0]: 3
                  EXB[17:0]: 20000
                Slot0[17:0]: 20000
                Slot1[17:0]: 00000
    Recordstop analysis resulting in FAIL, handled as Dstop

Workaround

To recover the domain for this issue, use the "setkeyswitch off" and "setkeyswitch on" commands to force a regular (i.e. not -Q) POST to run. Normally, SMS will rerun POST without a "-Q" automatically after this type of issue. However, allowing the automatic rerun of POST is the recommended method.


Resolution

This issue is addressed in the following releases:

SPARC Platform

  • Sun Fire 12K/15K with SMS 1.3 patch 114608-09 or later
  • Sun Fire 12K/15K/20K/25K with SMS 1.4.1 patch 117371-02 or later

For Sun Fire 12K/15K running SMS 1.2 and 1.4, an upgrade to SMS 1.4.1 is required.




Modification History




Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 200289
Article Type : Sun Alert
Last reviewed : 2004-05-25
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1