Sun Fire 3800/4800/4810/6800 Servers May Experience an Outage Due to Failure of the Primary and Backup Redundant Transfer Switches |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun Fire 3800 Server Sun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server
|
| Bug Id : | 4533232, 4741828
|
| Date of Resolved Release : | 27-MAY-2003
|
Impact
A power outage may occur on Sun Fire 3800/4800/4810/6800 servers due to a blown fuse in a secondary Redundant Transfer Switch (RTS) when AC power is switched over from the primary RTS. The probability of this occurring is small. Based on current failures seen to date, Sun expects the failure rate for this issue to be approximately 0.7%.
Contributing Factors
This issue can occur on the following platforms:
-
Sun Fire Servers 3800/4800/4810/6800
This issue only occurs in the following circumstances:
-
The system shipped between 16-Mar-2001 and 31-Aug-2002
-
Or an RTS Field Replaceable Unit (FRU) 300-1396-05 (or lower) was installed
Notes:
1) Affected RTS units will have serial numbers lower than 20557. These units will have part number 300-1396-05 (or lower). Affected switches can be identified by physical inspection of their serial number label.
Sun Serial Number Example: 0000025-0132A16445
The "A" is the assembly code for the RTS and "16445" is the sequential serial number.
2) Some Sun Fire 3800/4800/4810/6800 systems will only contain one RTS per Redundant Transfer Unit (RTU). While not susceptible to this particular failure mode, they should still have the Resolution applied if they fall within the affected range.
Symptoms
When the described issue occurs, the primary (left hand side) RTS loses power, and a failover is initiated to the secondary (failed) RTS. All LEDs on both primary and secondary RTS's will be OFF and the RTU will no longer supply power to the system.
In some cases, a failure can occur such that a failed secondary RTS will appear as good, with the leftmost "AC present" LED ON and rightmost fault LED OFF when, in fact, the RTS is not functional.
A failed secondary (right hand side) RTS module can be identified during system downtime by switching off the switch on the front of the primary (left hand side) RTS. A loud snap will be heard as the relays in the RTS units transfer the load from primary to secondary. If the secondary RTS is good, it will pick up the load and provide AC power to the system. If the secondary RTS has failed, it will not provide AC power to the system, and its leftmost "AC present" LED will be OFF.
Workaround
A secondary RTS need not be present in an RTU enclosure for normal operation. If a failed secondary RTS is detected, it should be switched OFF until it can be replaced. Leaving the failed secondary RTS switched ON will make the system susceptible to failure from short (less than 20ms) power drops which it would otherwise be able to tolerate.
Resolution
This issue is addressed with the implementation of a Field Change Order (FCO). Sun is currently finalizing a service implementation plan, and affected customers will be contacted by a Sun Services representative. If you are concerned that your operation may be affected by this issue, please contact your local Sun Services representative to discuss an action plan.
Modification History
AttachmentsThis solution has no attachment