Sun Fire 12K and Sun Fire 15K Domain May Panic or "Dstop" During the Reboot Sequence



Category :Availability
Release Phase :Resolved
Product :Sun Fire 12K Server
Sun Fire 15K Server  
Bug Id :4753686  
Date of Workaround Release :21-MAR-2003 
Date of Resolved Release :06-MAY-2003 


Impact

A Sun Fire 12K/15K domain may panic or "Dstop" during the reboot using either the reboot(1M) or the "init 6" command from the root user prompt. The domain may panic with a "safari bus error", or it may "Dstop" with an "Intrupt MappedIn not seen" error, or it may do both.


Contributing Factors

This issue can occur in the following releases:

SPARC Platform

  • Sun Fire 12K/15K with System Management Software (SMS) 1.1
  • Sun Fire 12K/15K with System Management Software (SMS) 1.2 without patch 112488-12
  • Sun Fire 12K/15K with System Management Software (SMS) 1.3 without patch 114608-02

Note: The issue may occur if there are fewer CPU modules than IO controllers (2 per hPCI board, 1 per wPCI board).


Symptoms

The following are sample panic messages:

	Mar  9 15:19:59 2002 ECC_Ctrl=e000000000000000
	Mar  9 15:19:59 2002 UE_AFSR=00000000000fe1ff UE_AFAR=00000f8000000000
	Mar  9 15:19:59 2002 CE_AFSR=00000000000fe1ff CE_AFAR=00000f8000000000
	Mar  9 15:19:59 2002 panic[cpu0]/thread=10408000: Safari bus error: CSR=... ErrCtrl=fc00...
	Sep  9 15:19:59 2002 IntrCtrl=8000000000000017 ErrLog=0000000000000010
	Sep  9 15:19:59 2002 ECC_Ctrl=e000000000000000

The following will be seen in the corresponding "dsmd.dstop" state dump fter a DSTOP:

	redxl> wfail
	SDI EX00/S0: All SDI is DStopped and RStopped, requested by DARB.
	SDI EX01/S0: Slot 1 port is DStopped, SDI is RStopped, requested by DARB.
	SDI EX02/S0: Slot 1 port is DStopped, SDI is RStopped, requested by DARB.
	SDI EX03/S0: Slot 1 port is DStopped, SDI is RStopped, requested by DARB.
	SDI EX04/S0  Master_Stop_Status0[31:0] = 9000008F
        MStop0[3:0]: All SDI logic is DStopped + Recordstopped.
	SDI EX04/S0  Dstop0[31:0] = 2004A000
        Dstop0[18]: D    DARB texp requests Slot1 Dstop (M)
        Dstop0[29]: D 1E Slot1 asserted Error, enabled to cause Dstop (M)
	EPLD IO04  Err1_Dom1: Mask= B0  Err= 40  1stErr= 40
        Err1[6]:  1E+ Error reported by BBC0
	BBC IO04/BB0   Device_Err_Stat[31:0] = 80008100
        DevErr[    8]:   1E  Port 0 Safari device asserted error
	PCI IOC IO04/P0   Safari_Err_Log[63:0] = 80000000 00000210
        ErrLog[ 4]: Intrupt MappedIn not seen for trans init'd by PCI IOC
        ErrLog[ 9]: ErrOut  Timeout on head of CI queue
        ErrLog[63]: Error Out asserted (S_ERROR_L pin)
	FAIL Port IO4/P0:  Dstop detected by BBC IO4/BB0.

Workaround

To work around the described issue, do not use reboot(1M) or the "init 6" command to reboot a failing domain.

Reboot the domain as follows:

  • shutdown domain as user root using shutdown(1M)
  • as user "sms-svc" on system controller, setkeyswitch off/on

Alternatively, make the number of CPU modules greater than or equal to the number of IO controllers.

Or, modify the domain's power on self test operation by adding the following to the platform and/or domain specific ".postrc" file(s) as required (for example, to the "/etc/opt/SUNWSMS/SMS1.2/config/platform/.postrc" file):

	dash_Q_fail

Note: The "dash_Q_fail" condition variable was introduced in SMS 1.2 with patch 112488-10, and is available in SMS 1.3. It is not available in SMS 1.1. For this reason one must upgrade to SMS 1.2 with patch 112488-10 or later, or upgrade to SMS 1.3 prior to using this workaround.


Resolution

This issue is addressed in the following releases:

SPARC Platform

  • Sun Fire 12K/15K with System Management Software (SMS) 1.2 with patch 112488-12 or later
  • Sun Fire 12K/15K with System Management Software (SMS) 1.3 with patch 114608-02 or later

Note: SMS 1.1 will require an upgrade to SMS 1.2 or SMS 1.3 with the appropriate patch.




Modification History


Date: 27-MAR-2003
  • minor modification to Relief/Workaround section

Date: 06-MAY-2003
  • State: Resolved
  • Updated Contributing Factors and Resolution section




Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201307
Article Type : Sun Alert
Last reviewed : 2003-05-06
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1