Sun Fire 12K/15K/E20K/E25K Domains Running Solaris 8 2/04 May Experience Bus Error When Using Dynamic Reconfiguration |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun Fire 12K Server Sun Fire E20K Server Sun Fire 15K Server Sun Fire E25K Server
|
| Bug Id : | 6532060
|
| Date of Workaround Release : | 24-JUL-2007
|
| Date of Resolved Release : | 20-Jun-2008
|
Sun Fire 12K/15K/E20K/E25K Domains Running Solaris 8 2/04 May Experience Bus Error When Using Dynamic Reconfiguration
1. Impact
When using Dynamic Reconfiguration (DR) to detach the board hosting the permanent memory for a Sun Fire 12K/15K/E20K/E25K domain running Solaris 8 2/04, and the domain is composed of one or more HsPCI+ assemblies, the domain may be interrupted by a "Safari Bus Error" causing a domain outage.
2. Contributing Factors
This issue can occur on the following platforms:
SPARC Platform
- Sun Fire 12K/15K/E20K/E25K domains running Solaris 8 2/04 without patch 116962-13
Note: Sun Fire 12K/15K/E20K/E25K domains running Solaris 9 and 10 are not affected by this issue.
This issue will only occur if both the following conditions are true:
- A Dynamic Reconfiguration (DR) is attempted on the board hosting the permanent memory (kernel)
- One or more HsPCI+ boards are installed in the domain
To determine that the domain is composed of HsPCI+ assemblies, the following command can be run:
sms-svc% showboards -v -d 0 | grep HPCI
IO0 On HPCI+ Active Passed 0
IO1 On HPCI+ Active Passed 0
The board to be detached hosts the kernel memory board, as in the following example:
May 10 10:21:08 2007 root# cfgadm -av | grep perm
May 10 10:21:10 2007 SB1::memory
connected configured ok
base address 0x1e000000000, 8388608 KBytes total, 2313832 KBytes permanent
3. Symptoms
During the copy/rename operation, the domain will experience a "Safari Bus Error" causing a domain outage, as in the following example:
May 10 10:26:01 2007 root# cfgadm -c disconnect SB1
May 10 10:26:21 2007 System may be temporarily suspended, proceed (yes/no)? yes
May 10 10:26:30 2007 May 10 10:26:23 DATA01 dr: OS unconfigure dr@0:SB1::cpu0
May 10 10:26:32 2007 May 10 10:26:25 DATA01 dr: OS unconfigure dr@0:SB1::memory
May 10 10:28:14 2007
May 10 10:28:14 2007 DR: checking devices...
May 10 10:28:14 2007 DR: suspending user threads...
May 10 10:28:15 2007 DR: suspending kernel daemons...
May 10 10:28:15 2007 DR: suspending drivers...
May 10 10:28:15 2007 suspending pci108e,c416@2 (aka sbbc)
May 10 10:28:15 2007 suspending pci100b,35@0 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@1 (aka ce)
May 10 10:28:15 2007 suspending sd@8,0
May 10 10:28:15 2007 suspending sd@9,0
May 10 10:28:15 2007 suspending pci1000,b@2 (aka glm)
May 10 10:28:15 2007 suspending pci1000,b@2,1 (aka glm)
May 10 10:28:15 2007 suspending pciclass,060400@1 (aka pci_pci)
May 10 10:28:15 2007 suspending pci108e,1101@3,1 (aka eri)
May 10 10:28:15 2007 suspending pciclass,0c0310@3,3 (aka ohci)
May 10 10:28:15 2007 suspending pciclass,060400@1 (aka pci_pci)
May 10 10:28:15 2007 suspending pci108e,8002@1c,700000 (aka pcisch)
May 10 10:28:15 2007 suspending pci100b,35@0 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@1 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@2 (aka ce)
May 10 10:28:15 2007 suspending pci100b,35@3 (aka ce)
May 10 10:28:15 2007 suspending pciclass,060400@1 (aka pci_pci)
May 10 10:28:15 2007 suspending pci108e,8002@1c,600000 (aka pcisch)
May 10 10:28:15 2007 Safari bus error: CSR=0155555501c01e77 ErrCtrl=f8000000000003e0
May 10 10:28:15 2007 IntrCtrl=80000000000fc017 ErrLog=0000000000080000
May 10 10:28:15 2007 ECC_Ctrl=8000000000000000
May 10 10:28:15 2007 UE_AFSR=000001025b890138 UE_AFAR=0000088276090900
May 10 10:28:15 2007 CE_AFSR=0000000d86890111 CE_AFAR=0000014296d76a00
May 10 10:28:15 2007 FirstErrLog=0000000000080000 FirstErrorAddr=0000000000000000
May 10 10:28:15 2007 LeafStatus=0000000000000000
May 10 10:28:15 2007 panic[cpu3]/thread=2a10034fd20: Safari bus error: CSR=0155555501c01e77 ErrCtrl=f8000000000003e0
May 10 10:28:15 2007 IntrCtrl=80000000000fc017 ErrLog=0000000000080000
May 10 10:28:15 2007 ECC_Ctrl=8000000000000000
May 10 10:28:15 2007 UE_AFSR=0000010
May 10 10:28:16 2007 syncing file systems... done
In the above example, the CSR value points to one of the HsPCI+ assemblies installed in the domain (in this case, CSR=0155555501c01e77 ==> IO0/P0).
In general, some 'dsmd.hwconfig' and 'dsmd.dump' files are dumped as a consequence. Using the 'redx' on the 'dsmd.hwconfig' dump file reports a parity error on the internal memory on the I/O controller pointed to by the CSR value:
redxl> shioc 0 1 0
xmits IO00/P0 (0.1.0) Component ID = 34651049 TO_2.1
...
Safari_Err_Log[63:0] = 00000000.00080000
Safari_1st_Err_Log[63:0] = 00000000.00080000
Safari_Err_Enbl[63:0] = F8000000.000003E0
Safari_Err_Int_Enbl[63:0] = 80000000.000FC017
ErrLog[19]: 1E Intrupt Internal Parity Error in PCI-B Leaf Logic
1st_Err_Data[59:0] = 0000000.00000000
...
...
Note: Data is displayed from the currently loaded dump file.
4. Workaround
Until the patch can be applied (or the system is upgraded to a later Solaris OS version), it is recommended to avoid detaching the system board hosting the kernel memory of domains running Solaris 8 2/04 and composed of HsPCI+ assemblies.
5. ResolutionThis issue is addressed in the following release:
This Sun Alert
notification is being provided to you on
an "AS IS"
basis. This Sun Alert notification may contain information provided by
third parties. The issues described in this Sun Alert notification may
or may not impact your system(s). Sun makes no representations,
warranties, or guarantees as to the information contained herein. ANY
AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU
ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT
OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This
Sun Alert notification contains Sun proprietary and confidential
information. It is being provided to you pursuant to the provisions of
your agreement to purchase services from Sun, or, if you do not have
such an agreement, the Sun.com Terms of Use. This Sun Alert
notification may only be used for the purposes contemplated by these
agreements.
Copyright 2000-2008 Sun Microsystems,
Inc., 4150 Network Circle, Santa
Clara, CA 95054 U.S.A. All rights reserved.
Modification History20-Jun-2008: Updated Contributing Factors and Resolution sections; now Resolved
AttachmentsThis solution has no attachment