Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days



Category :AvailabilityData Loss
Release Phase :Resolved
Product :Sun StorageTek 3900 Series
Sun StorageTek 6900 Series
Sun StorageTek T3 Array
Sun StorageTek T3+ Array  
Bug Id :4785593  
Date of Workaround Release :30-JAN-2003 
Date of Resolved Release :30-MAY-2003 


Impact

Sun StorEdge T3 and T3+ arrays (including Sun StorEdge T3+ arrays contained in the Sun StorEdge 3900 and Sun StorEdge 6900 Series) may reset and/or lose host connectivity for 2-3 minutes if it has been running continuously for exactly 497 days and if I/O operations are in progress at that time.

Data may become unavailable or may get lost permanently, depending on the system configuration and how applications react to the arrays resetting or temporarily losing host connectivity.


Contributing Factors

This issue can occur with the following configurations:

  • Sun StorEdge T3 array with firmware 1.18.01 or earlier
  • Sun StorEdge T3+ array with firmware 2.01.03 or earlier
  • Sun StorEdge 3900 Series (SE39xx) containing Sun StorEdge T3+ arrays with firmware 2.01.03 or earlier
  • Sun StorEdge 6900 Series (SE69xx) containing Sun StorEdge T3+ arrays with firmware 2.01.03 or earlier

This issue will only occur if I/O operations are executed on the array at the time it has been running continuously for exactly 497 days. If the array is idle at that time, this issue will not occur.

Note: There is no "uptime" or similar command on the T3/T3+. To identify how long the T3/T3+ has been running it is necessary to review the T3/T3+ syslog (or remote logging file), or to review the change logs possibly kept by the system administrator to find the date of the last T3/T3+ boot. Applying new firmware to a T3/T3+ requires an array reboot. Therefore, T3/T3+ arrays whose firmware has been kept updated with firmware releases will have been rebooted during the update process and hence are less likely to be impacted by this issue.


Symptoms

1. Messages similar to the following may be seen in the "/var/adm/messages" file:

    unix: ID[SUNWssa.socal.link.5010] socal3: port 0: Fibre Channel is OFFLINE
    unix: WARNING: /sbus@49,0/SUNW,socal@1,0/sf@0,0 (sf6):
    unix:        Offline Timeout
    unix: sf6:   target 0x2 al_pa 0xe4 offlined
    unix: WARNING: /sbus@49,0/SUNW,socal@1,0/sf@0,0/ssd@w50020f2300007193,0 (ssd3):
    unix:        SCSI transport failed: reason 'tran_err': giving up

The above messages indicate a loss of host connectivity to a T3/T3+ array and may occur for different reasons. Should the issue described in this Sun Alert document occur, one of the sets of T3/T3+ messages listed below will also be seen:

2. As it restarts, the T3/T3+ syslog may record the following reason for the T3/T3+ resetting:

    ROOT[1]: W: u1ctr Assertion Reset (3000) was initiated at yyyymmdd 
    hhmmss ../../common/bss/qlcf.c line xxxx, Assert(cmd->cmd_deadline 
    != CAM_TIME_INFINITY) => 0 BOOT

3. The T3/T3+ syslog may record messages similar to the following, at the time that host connectivity is lost:

    ISR1[1]: N: u1ctr ISP2100[2] Fatal timeout on host 125
    ISR1[1]: N: u1ctr ISP2100[2] Received LIP(f0,f0) async event
    ISR1[1]: N: u1ctr ISP2100[2] qlcf_invalidate_pdb: PDB Invalidate (host id 125)
    FCC0[1]: N: u1ctr Port event received on port 0, abort 0 (id 125)
    ISR1[1]: N: u1ctr ISP2100[2] qlcf_sync_pdb: PDB Sync Initiated (host id 125)
    ISR1[1]: N: u1ctr ISP2100[2] qlcf_update_pdb: PDB Sync Done (host id 125)
    ISR1[1]: N: u1ctr ISP2100[2] PDB Sync Done (host id 125,
                                 host WWN 2004020000101a00)
    FCC0[1]: N: u1ctr PDB Changed on port 0 (id 125)
    [...]

10 seconds later:

    ISR1[1]: N: u1ctr ISP2100[2] Fatal timeout on host 125
    ISR1[1]: N: u1ctr ISP2100[2] qlcf_i_watch_host_port: Debug Code - ISP2100 Hang
                                 Detected
    ISR1[1]: N: u1ctr ISP2100[2] interface going offline
    ISR1[1]: N: u1ctr ISP2100[2] qlcf_init_pdb: PDB Initialize
    ISR1[1]: N: u1ctr ISP2100[2] QLCF_I_ABORT_ALL_TM_CMDS: Target-mode Flush Started
                                 (lun = 0x0)
    ISR1[1]: N: u1ctr ISP2100[2] interface going online
    [...]
    SIMT[1]: N: u1ctr Initializing host port u1p1 ISP2100 ... firmware status = 7
    [...]
    DUMP[1]: N: u1ctr ISP2100[2] [==>BEG]ISPDEBUGDUMP:
    DUMP[1]: N: u1ctr ISP2100[2]  PBIU REGISTERS (OFFSET 00H, 8):
    DUMP[1]: N: u1ctr ISP2100[2]        0000 0001 0002 0003 0004 0005 0006 0007
    DUMP[1]: N: u1ctr ISP2100[2]        ---- ---- ---- ---- ---- ---- ---- ----
    [... followed by lines of hex data ...]

Workaround

To prevent the described issue, the Sun StorEdge T3/T3+ array needs to be rebooted before it has been running for 497 days.


Resolution

This issue is addressed in the following releases:

  • Sun StorEdge T3 array with firmware 1.18.02 or later (firmware 1.18.02 is available as patch 109115-13)
  • Sun StorEdge T3+ array with firmware 2.01.04 or later (firmware 2.01.04 is available as patch 112276-07)
  • Sun StorEdge 3900 Series (SE39xx) containing Sun StorEdge T3+ arrays with firmware 2.01.04 or later (firmware 2.01.04 is available as patch 112276-07)
  • Sun StorEdge 6900 Series (SE69xx) containing Sun StorEdge T3+ arrays with firmware 2.01.04 or later (firmware 2.01.04 is available as patch 112276-07)



Modification History


Date: 30-MAR-2003
  • Sun StorEdge T3 array firmware 1.18.02 is available as patch 109115-13

Date: 30-MAY-2003
  • Sun StorEdge T3+ array firmware 2.01.04 is available as patch 112276-07



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 201217
Article Type : Sun Alert
Last reviewed : 2003-05-30
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1