On Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun Fire 3800 Server Sun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server Sun Fire V1280 Server Netra 1280 Server
|
| Bug Id : | 4876369
|
| Date of Workaround Release : | 10-SEP-2003
|
| Date of Resolved Release : | 04-NOV-2003
|
Impact
On very rare occasions, the Time of Day (TOD) on Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 domains may be susceptible to a clock drift or jump. As a result, any functionality that relies upon the System Controller (SC) timer may be inaccurate.
Contributing Factors
This issue can occur in the following releases:
SPARC Platform
-
Sun Fire V1280 and Netra 1280 with firmware (ScApp) 5.13.0014 or earlier
-
Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.12.x
-
Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.13.x
-
Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.14.x
-
Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.15.0, 5.15.1 and 5.15.2
Note: Systems with firmware 5.11.x are not affected by this issue. Use the "showsc -v" command to display the firmware version of the SC.
Symptoms
This issue may occur after 528 days of SC continuous uptime, where the TOD within a domain in the system may become random and unstable. The intervals reported have varied, but the behavior is generally that the TOD jumps backwards approximately one hour up to as much as one month. The TOD as seen by the SC itself does not jump.
There are no specific messages that would indicate this issue has occurred. It can only be discovered by the domain exhibiting unexpected behavior due to the domain TOD changing unexpectedly.
Workaround
There are three options available that can be applied to avoid this issue:
-
Setting the variable "tod_broken" to 1 in the domain kernel (see below), or
-
Reboot the SCs before 528 days of SC continuous uptime (recommended at 500 days), or
-
Install Patch 112884-04 (ScApp 5.15.3)
To work around the described issue in a running domain, immediate relief can be obtained by setting the variable "tod_broken" to 1 in the domain kernel. This will cause Solaris to ignore the clock data coming from the Serengeti clock driver and use a domain kernel timebase as a reference instead.
The following script can be invoked as "root" on the running domain to change the value of "tod_broken" in that domain's kernel:
#!/bin/sh
#
# Set tod_broken
#
echo "tod_broken ?W 1" | adb -w -k /dev/ksyms /dev/mem
#
exit 0
Additionally, adding the line "set tod_broken=1" to the domain's "/etc/system" configuration information file will sustain the value of the "tod_broken" variable across a reboot of the domain.
At the next maintenance opportunity, the platform SCs should be rebooted. For systems with firmware 5.13 or later and failover configured, this can be accomplished by rebooting the spare SC first. After it has come up again and failover has become enabled and active, run the "setfailover force" command to make it the main SC, then reboot the other SC. When the other SC completes its reboot, running "setfailover force" again will restore it to the main SC state if desired.
For systems with firmware 5.12 or systems without failover enabled, it will be necessary to bring down any running domains before rebooting the SCs (Sun does not recommend rebooting a main SC with running domains as that action may disrupt domain operation).
Once the platform SCs have been rebooted, the domain TOD jumping will not recur for another 500 days. The "set tod_broken=1" variable can be removed from the "/etc/system" file, and reset to 0 in a running domain kernel by substituting 0 for 1 in the above script.
Resolution
This issue is addressed in the following releases:
-
Sun Fire V1280 and Netra 1280 with firmware (ScApp) 5.13.0015 (as delivered in patch 113751-05 or later)
-
Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.15.3 (as delivered in patch 112884-04 or later)
Note: The patch must be added to both system controllers to remedy this issue.
Modification HistoryDate: 18-NOV-2004
-
Firmware version 5.15.0 added to affected platforms in Contributing Factors
Date: 20-OCT-2004
-
Correction made in "Relief/Workaround" section for statement to read: "adding..."set tod_broken=1" to the domain's "/etc/system" file"
Date: 13-OCT-2004
-
Updated Contributing Factors and Resolution sections by adding Sun Fire V1280 and Netra 1280 to affected platforms; add patch for fix
Date: 04-NOV-2003
-
Update Contributing Factors, Relief/Workaround, Symptoms and Resolution sections
-
Re-release as Resolved
AttachmentsThis solution has no attachment