Sun Fire V440 and Netra 440 Systems Using a Specific Networking Configuration may Unexpectedly Reset |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun Fire V440 Server Netra 440 Server
|
| Bug Id : | 5039862
|
| Date of Resolved Release : | 29-SEP-2005
|
Impact
Under certain conditions using a specific network configuration, the Sun Fire V440 or Netra 440 system may experience an unexpected reset and reboot.
Contributing Factors
This issue can occur in the following releases:
SPARC Platform
This issue only occurs when there is system bus signal activity coincident with a specific PCI bus signal activity occuring on the first onboard Ethernet interface. Under Solaris this is typically logical device "ce0", and physically this is the ethernet RJ45 connector NET 0.
Symptoms
If the described issue occurs, the system resets, and the following error message appears on the console.
Fatal Error Reset
SC Alert: Host System has Reset
The system then reboots. No core files are generated, and the reset output will not be logged to the "/var/adm/messages" file.
If it is suspected that the system is experiencing this issue, change the OBP variables as follows to provide more verbose output in the event of another occurrence.
Note: The OBP settings below are only recommended to verify whether the system is experiencing this issue and should not be used long term. Once the failure is verified, then the parameters should be set back to their original values (make a note of these before changing). The settings below provides more verbose output:
diag-switch? true
post-trigger none
obdiag-trigger none
When the parameters above are set, the error message will include some additional information indicating the reset reason as "PBM FATAL", with a PCI IO-Bridge register output similar to:
Fatal Error Reset
SC Alert: Host System has Reset
@(#)OBP 4.10.10 2003/08/29 06:25 Sun Fire V440
Clearing TLBs
Loading Configuration
Membase: 0000.0033.0000.0000
MemSize: 0000.0000.4000.0000
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Scrubbing Tomatillo tags... 0 1
Block Scrubbing Done
Find dropin, Copying Done, Size 0000.0000.0000.5ca0
PC = 0000.07ff.f000.4c88
PC = 0000.0000.0000.4d28
Find dropin, (copied), Decompressing Done, Size 0000.0000.0006.6700
ttya initialized
System Reset: (PBM FATAL)
JBUS-PCI bridge
JBUS-PCI bridge
slave Error Register: 8000000000001000
Workaround
To work around the described issue, use the steps provided below:
1a) If the application only requires a single network port, use only the second onboard Ethernet interface, net1 (ce1).
OR
1b) If the application requires multiple network ports, install a PCI ethernet card in any available PCI slot. Choosing to place the card into a 33MHz slot (Slot 0, 1 and 3) may lower performance relative to using the card in a 66MHz slot (Slot 5, 2 or 4). Slot 5 is preferred.
2) It is highly recommended that to ensure the onboard net0 port (ce0) is not accessed inadvertantly in a manner that could trigger this issue (e.g. SunVTS), that the ce0 interface be completely disabled. It is also recommended due to Solaris instance numbering, that this be done after initial Solaris installation, to ensure net1 is assigned ce1 instance, instead of ce0.
To completely disable onboard net0 (ce0) from the system, use the following commands to install an NVRAM script at the OBP "ok" prompt:
ok nvedit
0: probe-all install-console banner
1: " /pci@1c,600000/network@2" $delete-device drop
2:
^C
Type "Ctrl-C" to exit nvedit as shown above. Then continue with:
ok nvstore
ok setenv use-nvramrc? true
use-nvramrc? = true
ok reset-all
After the system resets, net0 (ce0) should not be visible by OBP (i.e. you should not see a path to net0 [/pci@1c,600000/network@2] when you run "show-devs" from OBP). And the net0 (ce0) device should not be seen by Solaris (e.g. prtconf or prtpicl commands).
Note: Additional information is available through normal support channels.
Resolution
Hardware remediation options are available. Please contact your local Sun Services representative and reference this document.
Modification HistoryDate: 14-JAN-2005
-
Updated Contributing Factors and Resolution sections
Date: 10-MAR-2005
- Updated Impact and Relief/Workaround sections
Date: 29-SEP-2005
AttachmentsThis solution has no attachment