Sun Fire 12K/15K/E20K/E25K Domains Running Solaris 8 2/04 May Experience Bus Error When Using Dynamic Reconfiguration



Category :Availability
Release Phase :Resolved
Product :Sun Fire 12K Server
Sun Fire E20K Server
Sun Fire 15K Server
Sun Fire E25K Server  
Bug Id :6532060  
Date of Workaround Release :24-JUL-2007 
Date of Resolved Release :20-Jun-2008 

Sun Fire 12K/15K/E20K/E25K Domains Running Solaris 8 2/04 May Experience Bus Error When Using Dynamic Reconfiguration


1. Impact

When using Dynamic Reconfiguration (DR) to detach the board hosting the permanent memory for a Sun Fire 12K/15K/E20K/E25K domain running Solaris 8 2/04, and the domain is composed of one or more HsPCI+ assemblies, the domain may be interrupted by a "Safari Bus Error" causing a domain outage.


2. Contributing Factors

This issue can occur on the following platforms:

SPARC Platform

  • Sun Fire 12K/15K/E20K/E25K domains running Solaris 8 2/04 without patch 116962-13
Note: Sun Fire 12K/15K/E20K/E25K domains running Solaris 9 and 10 are not affected by this issue.

This issue will only occur if both the following conditions are true:

  1. A Dynamic Reconfiguration (DR) is attempted on the board hosting the permanent memory (kernel)
  2. One or more HsPCI+ boards are installed in the domain


To determine that the domain is composed of HsPCI+ assemblies, the following command can be run:

    sms-svc% showboards -v -d 0 | grep HPCI
    IO0     On     HPCI+       Active      Passed    0
    IO1     On     HPCI+       Active      Passed    0


The board to be detached hosts the kernel memory board, as in the following example:

    May 10 10:21:08 2007 root# cfgadm -av | grep perm
    May 10 10:21:10 2007 SB1::memory
    connected    configured   ok     
    base address 0x1e000000000, 8388608 KBytes total, 2313832 KBytes permanent

3. Symptoms


During the copy/rename operation, the domain will experience a "Safari Bus Error" causing a domain outage, as in the following example:

    May 10 10:26:01 2007 root# cfgadm -c disconnect SB1 
    May 10 10:26:21 2007 System may be temporarily suspended, proceed (yes/no)? yes 
    May 10 10:26:30 2007 May 10 10:26:23 DATA01 dr: OS unconfigure dr@0:SB1::cpu0  
    May 10 10:26:32 2007 May 10 10:26:25 DATA01 dr: OS unconfigure dr@0:SB1::memory  
    May 10 10:28:14 2007  
    May 10 10:28:14 2007 DR: checking devices... 
    May 10 10:28:14 2007 DR: suspending user threads... 
    May 10 10:28:15 2007 DR: suspending kernel daemons... 
    May 10 10:28:15 2007 DR: suspending drivers... 
    May 10 10:28:15 2007    suspending pci108e,c416@2 (aka sbbc) 
    May 10 10:28:15 2007    suspending pci100b,35@0 (aka ce) 
    May 10 10:28:15 2007    suspending pci100b,35@1 (aka ce) 
    May 10 10:28:15 2007    suspending sd@8,0 
    May 10 10:28:15 2007    suspending sd@9,0 
    May 10 10:28:15 2007    suspending pci1000,b@2 (aka glm) 
    May 10 10:28:15 2007    suspending pci1000,b@2,1 (aka glm) 
    May 10 10:28:15 2007    suspending pciclass,060400@1 (aka pci_pci) 
    May 10 10:28:15 2007    suspending pci108e,1101@3,1 (aka eri) 
    May 10 10:28:15 2007    suspending pciclass,0c0310@3,3 (aka ohci) 
    May 10 10:28:15 2007    suspending pciclass,060400@1 (aka pci_pci) 
    May 10 10:28:15 2007    suspending pci108e,8002@1c,700000 (aka pcisch) 
    May 10 10:28:15 2007    suspending pci100b,35@0 (aka ce) 
    May 10 10:28:15 2007    suspending pci100b,35@1 (aka ce) 
    May 10 10:28:15 2007    suspending pci100b,35@2 (aka ce) 
    May 10 10:28:15 2007    suspending pci100b,35@3 (aka ce) 
    May 10 10:28:15 2007    suspending pciclass,060400@1 (aka pci_pci) 
    May 10 10:28:15 2007    suspending pci108e,8002@1c,600000 (aka pcisch) 
    May 10 10:28:15 2007 Safari bus error:  CSR=0155555501c01e77 ErrCtrl=f8000000000003e0 
    May 10 10:28:15 2007    IntrCtrl=80000000000fc017 ErrLog=0000000000080000 
    May 10 10:28:15 2007    ECC_Ctrl=8000000000000000 
    May 10 10:28:15 2007    UE_AFSR=000001025b890138 UE_AFAR=0000088276090900 
    May 10 10:28:15 2007    CE_AFSR=0000000d86890111 CE_AFAR=0000014296d76a00 
    May 10 10:28:15 2007    FirstErrLog=0000000000080000 FirstErrorAddr=0000000000000000 
    May 10 10:28:15 2007    LeafStatus=0000000000000000 
    May 10 10:28:15 2007  panic[cpu3]/thread=2a10034fd20: Safari bus error:  CSR=0155555501c01e77 ErrCtrl=f8000000000003e0 
    May 10 10:28:15 2007    IntrCtrl=80000000000fc017 ErrLog=0000000000080000 
    May 10 10:28:15 2007    ECC_Ctrl=8000000000000000 
    May 10 10:28:15 2007    UE_AFSR=0000010 
    May 10 10:28:16 2007 syncing file systems... done

In the above example, the CSR value points to one of the HsPCI+ assemblies installed in the domain (in this case, CSR=0155555501c01e77 ==> IO0/P0).

In general, some 'dsmd.hwconfig' and 'dsmd.dump' files are dumped as a consequence. Using the 'redx' on the 'dsmd.hwconfig' dump file reports a parity error on the internal memory on the I/O controller pointed to by the CSR value:

    redxl> shioc 0 1 0
    xmits IO00/P0 (0.1.0)   Component ID = 34651049    TO_2.1
    ...
       Safari_Err_Log[63:0]      = 00000000.00080000
       Safari_1st_Err_Log[63:0]  = 00000000.00080000
       Safari_Err_Enbl[63:0]     = F8000000.000003E0
       Safari_Err_Int_Enbl[63:0] = 80000000.000FC017
       ErrLog[19]: 1E Intrupt Internal Parity Error in PCI-B Leaf Logic
       1st_Err_Data[59:0] = 0000000.00000000
    ...
    ...

Note: Data is displayed from the currently loaded dump file.


4. Workaround

Until the patch can be applied (or the system is upgraded to a later Solaris OS version), it is recommended to avoid detaching the system board hosting the kernel memory of domains running Solaris 8 2/04 and composed of HsPCI+ assemblies.


5. Resolution

This issue is addressed in the following release:


This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.



Modification History

20-Jun-2008: Updated Contributing Factors and Resolution sections; now Resolved




Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201342
Article Type : Sun Alert
Last reviewed : 2008-06-20
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1