On Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump



Category :Availability
Release Phase :Resolved
Product :Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire V1280 Server
Netra 1280 Server  
Bug Id :4876369  
Date of Workaround Release :10-SEP-2003 
Date of Resolved Release :04-NOV-2003 


Impact

On very rare occasions, the Time of Day (TOD) on Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 domains may be susceptible to a clock drift or jump. As a result, any functionality that relies upon the System Controller (SC) timer may be inaccurate.


Contributing Factors

This issue can occur in the following releases:

SPARC Platform

  • Sun Fire V1280 and Netra 1280 with firmware (ScApp) 5.13.0014 or earlier
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.12.x
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.13.x
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.14.x
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.15.0, 5.15.1 and 5.15.2

Note: Systems with firmware 5.11.x are not affected by this issue. Use the "showsc -v" command to display the firmware version of the SC.


Symptoms

This issue may occur after 528 days of SC continuous uptime, where the TOD within a domain in the system may become random and unstable. The intervals reported have varied, but the behavior is generally that the TOD jumps backwards approximately one hour up to as much as one month. The TOD as seen by the SC itself does not jump.

There are no specific messages that would indicate this issue has occurred. It can only be discovered by the domain exhibiting unexpected behavior due to the domain TOD changing unexpectedly.


Workaround

There are three options available that can be applied to avoid this issue:

  1. Setting the variable "tod_broken" to 1 in the domain kernel (see below), or
  2. Reboot the SCs before 528 days of SC continuous uptime (recommended at 500 days), or
  3. Install Patch 112884-04 (ScApp 5.15.3)

To work around the described issue in a running domain, immediate relief can be obtained by setting the variable "tod_broken" to 1 in the domain kernel. This will cause Solaris to ignore the clock data coming from the Serengeti clock driver and use a domain kernel timebase as a reference instead.

The following script can be invoked as "root" on the running domain to change the value of "tod_broken" in that domain's kernel:

    #!/bin/sh
    #
    # Set tod_broken
    #
    echo "tod_broken ?W 1" | adb -w -k /dev/ksyms /dev/mem
    #
    exit 0

Additionally, adding the line "set tod_broken=1" to the domain's "/etc/system" configuration information file will sustain the value of the "tod_broken" variable across a reboot of the domain.

At the next maintenance opportunity, the platform SCs should be rebooted. For systems with firmware 5.13 or later and failover configured, this can be accomplished by rebooting the spare SC first. After it has come up again and failover has become enabled and active, run the "setfailover force" command to make it the main SC, then reboot the other SC. When the other SC completes its reboot, running "setfailover force" again will restore it to the main SC state if desired.

For systems with firmware 5.12 or systems without failover enabled, it will be necessary to bring down any running domains before rebooting the SCs (Sun does not recommend rebooting a main SC with running domains as that action may disrupt domain operation).

Once the platform SCs have been rebooted, the domain TOD jumping will not recur for another 500 days. The "set tod_broken=1" variable can be removed from the "/etc/system" file, and reset to 0 in a running domain kernel by substituting 0 for 1 in the above script.


Resolution

This issue is addressed in the following releases:

  • Sun Fire V1280 and Netra 1280 with firmware (ScApp) 5.13.0015 (as delivered in patch 113751-05 or later)
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.15.3 (as delivered in patch 112884-04 or later)

Note: The patch must be added to both system controllers to remedy this issue.




Modification History


Date: 18-NOV-2004
  • Firmware version 5.15.0 added to affected platforms in Contributing Factors

Date: 20-OCT-2004
  • Correction made in "Relief/Workaround" section for statement to read: "adding..."set tod_broken=1" to the domain's "/etc/system" file"

Date: 13-OCT-2004
  • Updated Contributing Factors and Resolution sections by adding Sun Fire V1280 and Netra 1280 to affected platforms; add patch for fix

Date: 04-NOV-2003
  • Update Contributing Factors, Relief/Workaround, Symptoms and Resolution sections
  • Re-release as Resolved



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 200817
Article Type : Sun Alert
Last reviewed : 2003-11-04
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1