Use of cfgadm(1M) on Certain Systems May Cause Domain Outage, Reporting "L2CheckError"



Category :Availability
Release Phase :Resolved
Product :Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire E6900 Server
Sun Fire E2900 Server
Sun Fire V1280 Server
Sun Fire E4900 Server  
Bug Id :6300392  
Date of Workaround Release :05-AUG-2005 
Date of Resolved Release :10-FEB-2006 


Impact

Use of the cfgadm(1M) command can trigger a domain outage with an "L2CheckError." A loss of application availability due to a system pause from this condition may be misdiagnosed and lead to unnecessary hardware replacement.


Contributing Factors

This issue can occur on the following platforms:

  • Sun Fire 3800, 4800, 4810, E2900, E4900, 6800, E6900 and V1280 systems without ScApp firmware 5.19.8 or 5.20.3 (as delivered in patches 114526-09 and 114527-04).

Notes:

  1. This issue may occur on the systems listed above running Solaris 8, 9 or 10. Solaris 7 does not support the x800/x900 series of Sun Fire Systems.
  2. This issue will only occur on systems configured for Dynamic Reconfiguration (DR).

An example use of cfgadm(1) causing this condition would be during the configuration of a system board, as in the following example:

    # cfgadm -c configure N0.SB2

(see error messages generated in "Symptoms" section)

To determine the version of ScApp on a system, the following command can be run (from the platform shell):

    sc0:SC> showsc
    ...
    ScApp version: 5.19.4 Build_01
    RTOS version: 45

 


Symptoms

Output from the "showerrorbuffer" command will display captured error messages similar to the following:

    ErrorData[19]
      Date: Mon Jun 13 20:55:01 GMT-07:00 2005
      Device: /SSC0/sbbc0/systemepld
      Register: FirstError[0x10] : 0x0800
            SB2 encountered the first error
    ErrorData[20]
      Date: Mon Jun 13 20:55:01 GMT-07:00 2005
      Device: /partition0/domain0/SB2/bbcGroup0/repeaterepld
      Register: FirstError[0x10]: 0x0001
            ar0 encountered the first error
    ErrorData[21]
      Date: Mon Jun 13 20:55:01 GMT-07:00 2005
      Device: /partition0/domain0/SB2/ar0
      ErrorID: 0x10221fff
      Register: L2CheckError[0x6150] : 0x00001e00
             CMDVSyncErr [12:09] : 0xf Ports [9:6] command valid mismatched 
             against internal expected command valid
    ErrorData[22]
      Date: Mon Jun 13 20:55:01 GMT-07:00 2005
      Device: /partition0/domain0/SB2/ar0
      ErrorID: 0x10221fff
      Register: L2CheckError[0x6150] : 0x0000001e
             PreqSyncErr [04:01] : 0xf Ports [9:6] prereq mismatched 
             against internal expected prereq
    ErrorData[23]
      Date: Mon Jun 13 20:55:01 GMT-07:00 2005
      Device: /partition0/domain0/SB2/ar0
      ErrorID: 0x10221fff
      Register: L2CheckError[0x6150] : 0x1e000000
          AccCMDVSyncErr [28:25] : 0xf accumulated valid command mismatch
    ErrorData[24]
      Date: Mon Jun 13 20:55:01 GMT-07:00 2005
      Device: /partition0/domain0/SB2/ar0
      ErrorID: 0x10221fff
      Register: L2CheckError[0x6150] : 0x001e0000
          AccPreqSyncErr [20:17] : 0xf accumulated prerequisite mismatch

and from the output of the "showlogs -d <domain name>" command for the same error:

    Jun 13 20:55:01 g1db1-sc0 Domain-A.SC: [ID 427805 local0.crit] ErrorMonitor:
    Domain A has a SYSTEM ERROR
    Jun 13 20:55:01 g1db1-sc0 Domain-A.SC: [ID 924577 local0.error] /N0/SB2 
    encountered the first error
    Jun 13 20:55:01 g1db1-sc0 Domain-A.SC: [ID 175522 local0.error] ArAsic 
    reported first error on /N0/SB2
    Jun 13 20:55:01 g1db1-sc0 Domain-A.SC: [ID 653352 local0.error]
    /partition0/domain0/SB2/ar0:

    >>>>>> L2CheckError[0x6150] : 0x1e1e9e1e

         CMDVSyncErr [12:09] : 0xf Ports [9:6] command valid mismatched against 
         internal expected command valid
         PreqSyncErr [04:01] : 0xf Ports [9:6] prereq mismatched against 
         internal expected prereq
      AccCMDVSyncErr [28:25] : 0xf accumulated valid command mismatch
                  FE [15:15] : 0x1
      AccPreqSyncErr [20:17] : 0xf accumulated prerequisite mismatch
        
    Jun 13 20:55:01 g1db1-sc0 Domain-A.SC: [ID 250001 local0.error] 
    [AD] Event: SF4800
         CSN: 229H2199 DomainID: A ADInfo: 1.SCAPP.15.4
         Time: Mon Jun 13 20:55:01 GMT-07:00 2005
         FRU-List-Count: 0; FRU-PN:  ; FRU-SN:  ; FRU-LOC: UNRESOLVED
         Recommended-Action: Service action required
 
    Jun 13 20:55:01 g1db1-sc0 Domain-A.SC: [ID 253130 local0.crit] Domain A is 
    currently paused due to an error.  
    This domain must be turned off via "setkeyswitch off" to recover

Workaround

To work around the described issue, use one of the two following options:

a) Reboot the main system controller

or:

b) Manually failover the main system controller

Details on failing over a system controller are beyond the scope of this Sun Alert, and can be found in the "Sun Fire Midrange Systems Platform Administration Manual," (#817-2971-10) found at http://docs.sun.com/app/docs?q=817-2971-10.


Resolution

This issue is addressed on the following platforms:

  • Sun Fire 3800, 4800, 4810, E2900, E4900, 6800, E6900 and V1280 systems with ScApp firmware 5.19.8 or 5.20.3 (as delivered in patches 114526-09 or later and 114527-04 or later)



Modification History


Date: 25-OCT-2005

25-Oct-2005:

  • Updated Contributing Factors

Date: 10-FEB-2006

10-Feb-2006:

  • Updated Impact, Contributing Factors and Resolution sections; re-release as Resolved

Date: 05-DEC-2006

05-Dec-2006:

  • Updated Contributing Factors and Resolution sections



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201760
Article Type : Sun Alert
Last reviewed : 2006-12-21
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1