CPU0/CPU1 May Be Disabled on Sun Fire 12K/15K System Boards Resulting in Domain Interruption



Category :Availability
Release Phase :Resolved
Product :Sun Fire 12K Server
Sun Fire 15K Server  
Bug Id :4830870, 4865526  
Date of Workaround Release :18-APR-2003 
Date of Resolved Release :08-JAN-2004 


Impact

System Management Software (SMS) may disable the CPU0/CPU1 pair on Sun Fire 12K/15K System Boards due to a false over-voltage reading for CPU1. There will be a domain interruption while SMS brings down the domain, removes the CPU0/CPU1 pair from service, and brings the domain back up.


Contributing Factors

This issue can occur in the following releases:

SPARC Platform

  • Sun Fire 12K/15K with SMS 1.1
  • Sun Fire 12K/15K with SMS 1.2 without patch 112481-15
  • Sun Fire 12K/15K with SMS 1.3 without patch 114640-08

Symptoms

When this issue occurs, the CPU0/CPU1 pair on one of the System Boards would be disabled by SMS. The SMS "showcomponent" command will show CPU0/CPU1 (PP0) as blacklisted.

	% showcomponent -a
	Component PROCPAIR at SB6/PP0 is disabled in specified blacklist:
	# ESMD High-Maximum Voltage 0313.1238.15

The following type of message will appear in the SMS log file "/var/opt/SUNWSMS/adm/platform/messages":

	Apr  8 18:49:49 2003 ... esmd[630]: [...] A high voltage has been 
	     detected on Core1, located on CPU at SB6. The voltage detected is 
	     1.45v; should be 1.31v to 1.44v. PROCPAIR at SB6/PP0 is being 
	     removed from the domain and powered off. Check all hardware for 
	     the cause.
	Apr  8 18:49:50 2003 ... esmd[630]: [...] Component PROCPAIR at 
	     SB6/PP0 has been blacklisted

Note: This issue only occurs for CPU1, due to its position on the System Board.


Workaround

If CPU0/CPU1 have been disabled, they can be enabled with:

	#enablecomponent -a SB6/PP0 

Note: Use appropriate System Board and processor pair values.


Resolution

This issue is addressed in the following releases:

SPARC Platform

Notes: SMS 1.1 requires an upgrade to a later release with appropriate patches.

The above resolution implements a change in SMS's behavior. When a processor core voltage is detected above or below a warning threshold a message will be logged to the SMS platform messages file (/var/opt/SUNWSMS/adm/platform/messages). The board will continue to run and the domain will be unaffected. An example warning message is shown below. This can most easily be found by searching the platform message files for the string "Core".

    Dec 17 16:41:45 2003 platform-sc0 esmd[564]: [0 108891482234403 
           ERR DetectorV.cc 645] A high voltage has been detected on Core3, 
           located on CPU at SB7. The voltage detected is 1.48v; should be 1.31v 
           to 1.47v. PROCPAIR at SB7/PP1 will be removed from the domain and 
           powered off if it rises above 1.65v.

The warning condition can also be observed with the SMS "showenvironment" command. For example,

  % showenvironment -p volts | grep Core
  CPU at SB6       pcf8591   Core 0 Volt      1.65    V     55.8  sec  OK
  CPU at SB6       pcf8591   Core 1 Volt      1.65    V     55.8  sec  OK
  CPU at SB6       pcf8591   Core 2 Volt      1.63    V     55.8  sec  OK
  CPU at SB6       pcf8591   Core 3 Volt      1.63    V     55.8  sec  OK
  CPU at SB7       pcf8591   Core 0 Volt      1.40    V     45.6  sec  OK
  CPU at SB7       pcf8591   Core 1 Volt      1.43    V     45.6  sec  OK
  CPU at SB7       pcf8591   Core 2 Volt      1.41    V     45.6  sec  OK
  CPU at SB7       pcf8591   Core 3 Volt      1.48    V     45.6  sec   
   HIGH_WARN  <---Note the high warning
  CPU at SB8       pcf8591   Core 0 Volt      1.66    V     55.1  sec  OK
  CPU at SB8       pcf8591   Core 1 Volt      1.64    V     55.1  sec  OK
  CPU at SB8       pcf8591   Core 2 Volt      1.64    V     55.1  sec  OK
  CPU at SB8       pcf8591   Core 3 Volt      1.63    V     55.1  sec  OK

When either a high or low warning condition is found it is important that you contact your authorized Sun Services Representative to have the System Board replaced. This should be done as soon as the condition is detected.




Modification History


Date: 30-APR-2003
  • Updated Impact
  • Updated Symptoms
  • Updated Synopsis

Date: 14-MAY-2003
  • Updated Relief/Workaround section

Date: 15-MAY-2003
  • Updated Relief/Workaround section

Date: 04-JUN-2003
  • Updated Relief/Workaround section

Date: 08-JAN-2004
  • Updated Avoidance
  • Updated State: Resolved
  • Updated Contributing Factors, Relief/Workaround and Resolution sections

Date: 15-JAN-2004
  • Updated Resolution



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201208
Article Type : Sun Alert
Last reviewed : 2004-01-08
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1