System May Hang or Panic Accompanied by "lpost" Messages


StatusIssued

DescriptionTop
Sun(sm) Alert Notification
  • Sun Alert ID: 101673 (formerly 57765)
  • Synopsis: System May Hang or Panic Accompanied by "lpost" Messages
  • Category: Availability
  • Product: Sun Fire 3800 Server, Sun Fire 4800 Server, Sun Fire 4810 Server, Sun Fire 6800 Server, Sun Fire E6900 Server, Sun Fire E2900 Server, Sun Fire V1280 Server, Sun Fire E4900 Server
  • BugIDs: 4978865, 5054736
  • Avoidance: Patch
  • State: Resolved
  • Date Released: 25-Apr-2005, 01-Aug-2005
  • Date Closed: 01-Aug-2005
  • Date Modified: 01-Aug-2005, 05-Dec-2005, 22-Mar-2006, 23-Aug-2006

1. Impact

False indications of hardware failure may be diagnosed incorrectly, and remedial action may lead to unnecessary hardware replacement. Loss of application availability may occur due to either a system panic or hang which may require a "setkeyswitch" cycle to recover.

2. Contributing Factors

This issue can occur on the following platforms:

  • Sun Fire 2900, 3800, 4800, 4810, 4900, 6800, 6900 and V1280 servers with System Controller (ScApp) firmware versions 5.18.x and earlier without ScApp firmware patch 114526-01, and domains running Solaris 9 without Kernel update patch 117171-14

Notes:

  1. Solaris 7, Solaris 8 and Solaris 10 are not affected by this issue. The Solaris x86 platform is not affected by this issue.
  2. In some cases, use of prtdiag(1M) has been shown to trigger false indications of system hardware failure.

3. Symptoms

Should the described issue occur, the system may present false indications of system hardware failure. In most cases there is little or no information in showlogs, showerrorbuffer or domain messages to indicate an error. The WARNING: Asynchronous Event message in the console coupled with a system hang or panic are the only indicators. Time-Out (TO) from system bus and/or Privileged (PRIV) code access error(s) messages may also be displayed.

Asynchronous event "lpost" messages or panic messages similar to the following examples (from the platform loghost output) may appear during routine shutdown or reboot:

    {/N0/SB1/P2} WARNING: Asynchronous Event.
    {/N0/SB1/P2} Component under test: /N0/SB1/P2 CPU
    {/N0/SB1/P2}     Unexpected event occurred
    {/N0/SB1/P2} Ino = 00000000.00000000
    {/N0/SB1/P2}  tl  tt         tstate                 tpc               tnpc
    {/N0/SB1/P2} 01  60  00000044.80000604  000007ff.f000bd3c  000007ff.f000bd40
    {/N0/SB1/P2} AFSR = 00000000.00000000
    {/N0/SB1/P2} AFAR = 00000028.04001800
    {/N0/SB1/P2} IMMU SFSR = 00000000.00000000
    {/N0/SB1/P2} DMMU SFSR = 00000000.00000000
    {/N0/SB1/P2} DMMU SFAR = 00000300.14821480
    {/N0/SB1/P2} PState = 00000000.00000814
    {/N0/SB1/P2} Dispatch Control =00000000.00000000
    {/N0/SB1/P2} Data Cache Unit Control =00000000.00000000
    {/N0/SB1/P2} Safari Config. = 0aaa0028.200c0006
    {/N0/SB1/P2} EState = 00000000.0000000b
    {/N0/SB1/P2}  tl  tt         tstate                 tpc               tnpc
    {/N0/SB1/P2} 02  32  00000099.80081402  000007ff.f0006cc0  000007ff.f0006cc4
    {/N0/SB1/P2} 01  60  00000044.80000604  000007ff.f000bd3c  000007ff.f000bd40
    {/N0/SB1/P2}    (TO) Time-out from system bus
    {/N0/SB1/P2}    (PRIV) Privileged code access error(s)

This second example displays another variation of an "Asynchronous Event" message from lpost:

    {/N0/SB4/P1} @(#) lpost 5.17.2 2004/08/13 11:53
    {/N0/SB4/P1} Copyright 2001-2004 Sun Microsystems, Inc. All rights reserved.
    {/N0/SB4/P1} Use is subject to license terms.
    {/N0/SB4/P1} test case reset reason = 00000000.0404ff07
    {/N0/SB4/P1} test case ecache_size=00000000.00800000, tag_size=00000000.00004000
    {/N0/SB4/P1} test case Ecache Mode: 0:3:3
    {/N0/SB4/P1} test case E$ control register = 00000000.00094400
    {/N0/SB4/P1} @(#) lpost 5.17.2 2004/08/13 11:53
    {/N0/SB4/P1} Copyright 2001-2004 Sun Microsystems, Inc. All rights reserved.
    {/N0/SB4/P1} Use is subject to license terms.
    {/N0/SB4/P1} test case reset reason = 00000004.04ff0707
    {/N0/SB4/P1} test case ecache_size=00000000.00800000, tag_size=00000000.00004000
    {/N0/SB4/P1} test case Ecache Mode: 0:3:3
    {/N0/SB4/P1} test case E$ control register = 00000000.00094400
    {/N0/SB4/P1} test case IoSram Add : 0000041c.00900000
    {/N0/SB4/P1} WARNING: Asynchronous Event.
    {/N0/SB4/P1} Component under test: /N0/SB4/P1 CPU
    {/N0/SB4/P1} Task 00000000.00037144 does not exist

This third example displays an "ERROR" message from lpost (The ERROR message was replaced with the WARNING message format due to changes made for bug 4988128, with firmware revisions 5.15.5, 5.16.1, 5.17.1, 5.18.0 and higher):

    {/N0/SB1/P2} Use is subject to license terms.
    {/N0/SB1/P2} test case reset reason = 00000001.04ff0707
    {/N0/SB1/P2} test case ecache_size=00000000.00800000, tag_size=00000000.00004000
    {/N0/SB1/P2} test case E$ control register = 00000000.07c55400
    {/N0/SB1/P2} test case IoSram Add : 00000420.00900000
    {/N0/SB1/P2} ERROR: TEST=Dummy,SUBTEST=Slave Test ID=0.0
    {/N0/SB1/P2} Component under test: /N0/SB1/P2 CPU
    {/N0/SB1/P2} Task 00000000.000374a8 does not exist
    {/N0/SB1/P2} @(#) lpost 5.15.3 2003/09/30 23:01

A second scenario is a system panic, also accompanied by one of the above types of error messages reported in the console logs. System recovery is via panic reboot. Panic messages vary; some examples are:

Example 1:

    panic: failed to stop cpu5

    panic[cpu6]/thread=30005c537c0: bad kernel MMU trap at TL 2

    %tl %tpc              %tnpc             %tstate           %tt
     1  000000000101819c  00000000010181a0  9900001601        068
        %ccr: 99  %asi: 00  %cwp: 1  %pstate: 16<PEF,PRIV,IE>
     2  0000000001008c44  0000000001008c48  4400041401        034
        %ccr: 44  %asi: 00  %cwp: 1  %pstate: 414<MG,PEF,PRIV>

Example 2:

    panic: failed to stop cpu6

    panic[cpu5]/thread=2a100c97d40: bad kernel MMU miss at TL 2

    %tl %tpc              %tnpc             %tstate           %tt
     1  000000000104cf68  000000000104cf6c  4400001603        060
        %ccr: 44  %asi: 00  %cwp: 3  %pstate: 16<PEF,PRIV,IE>
     2  00000000010cf884  00000000010cf888  9900081404        068

Notes:

  1. The Asynchronous Event warning messages may or may not include the "test case reset reason =" line. A test case reset reason code ending in "7" indicates a "red_state" condition.
  2. If other errors are observed in the system logs, these should be investigated as well.

Solution SummaryTop

4. Relief/Workaround

There is no workaround for this issue. Please see the Resolution section.


5. Resolution

This issue is addressed on the following platforms:

  • Sun Fire 2900, 3800, 4800, 4810, 4900, 6800, 6900 and V1280 servers with System Controller (ScApp) firmware version 5.19.0 (for Solaris 9) as delivered in ScApp firmware patch 114526-01 or later and Kernel update patch 117171-14 or later

Note: Kernel update patch version 117171-14 or higher is necessary to resolve BugID 4978865. Both patches must be installed to fully resolve this issue.


Change History

01-Aug-2005:
  • Update Contributing Factors and Resolution sections
05-Dec-2005:
  • Updated Contributing Factors section
22-Mar-2006:
  • Updated Contributing Factors and Resolution sections
23-Aug-2006:
  • Updated Contributing Factors and Resolution sections

This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

Copyright 2000-2006 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 101673
Article Type : Sun Alert Notifications
Last reviewed : 2006-08-23
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article