Failed Controller Condition May Cause Data Integrity Issues



Category :Data Loss
Release Phase :Resolved
Product :Sun StorageTek 3310 SCSI Array
Sun StorageTek 3510 FC Array
Sun StorageTek 3320 SCSI Array
Sun StorageTek 3511 SATA Array  
Bug Id :6355818  
Date of Workaround Release :13-DEC-2005 
Date of Resolved Release :15-JUN-2006 


Impact

On a Sun StorEdge 33x0/35xx array, when a failed RAID controller condition exists and the array is power cycled, data integrity issues may occur.


Contributing Factors

This issue can occur on the following platforms:

  • Sun StorEdge 3310 SCSI array without firmware 4.15F (as delivered in patch 113722-15)
  • Sun StorEdge 3320 SCSI array without firmware 4.15G (as delivered in patch 113730-01)
  • Sun StorEdge 3510 FC array without firmware 4.15F (as delivered in patch 113723-15)
  • Sun StorEdge 3511 FC/SATA array without firmware 4.15F (as delivered in patch 113724-09)

Note: This issue can occur with all current firmware revisions available for the Sun StorEdge 33x0/35xx arrays.

This issue can occur when a default primary RAID controller failure condition exists and the array is power cycled during that time, resulting in stale cache data (contained in the failed controller) being written unexpectedly to disk.

Note: The default primary RAID controller is the controller with the higher serial number. This can be determined via the CLI by using the sccli syntax "sccli> show redundancy," and is not the serial number on the back of the FRU.


Symptoms

Upon power cycling the array, the failed controller comes online and the existing filesystems on the array report fsck(1M) or other data integrity issues.


Workaround

The failed controller's cache needs to be discarded or the failed controller must be removed from the array prior to resetting or power cycling the array.

Note: Always replace the failed controller with the power on (the array can be power cycled with just one controller).

Scenario 1 - Spare controller available:

With the power on and the array operational on one controller, install the replacement controller for the failed controller. (Removing the failed controller with power on before the array is power cycled will not allow any stale data from the failed controller to be written out).

Scenario 2 - Spare controller unavailable:

Option 1: Assumption is you have sccli in-band or out-of-band access to the array.

Unfail the controller using sccli syntax "sccli> unfail." The failed controller's cache will be discarded when the controller is put back on-line as a Secondary controller. If this command fails, follow Option 2.

Option 2: Prior to resetting or power cycling the array, remove the failed controller, and then remove the battery module for at least 5 seconds on the failed controller and reinsert battery to invalidate the cache on the failed controller. To maintain proper air flow, partially reinsert controller until it is 1 inch from full reseating location.

Please refer to Sun documentation at: http://docs.sun.com/app/docs/doc/816-7326-20

Please refer to the corresponding documents for the required firmware levels to identify failed controllers:

"Sun StorEdge 3000 Family Installation, Operation, and Service Manual" collection at http://www.sun.com/products-n-solutions/hardware/docs/Network_Storage_Solutions/Workgroup/index.html

and the "Sun StorEdge 3000 Family CLI User's Guide" at http://www.sun.com/products-n-solutions/hardware/docs/html/817-4951-14


Resolution

This issue is addressed on the following platforms:

  • Sun StorEdge 3310 SCSI array with firmware 4.15F (as delivered in patch 113722-15 or later)
  • Sun StorEdge 3320 SCSI array with firmware 4.15G (as delivered in patch 113730-01 or later)
  • Sun StorEdge 3510 FC array with firmware 4.15F (as delivered in patch 113723-15 or later)
  • Sun StorEdge 3511 FC/SATA array with firmware 4.15F (as delivered in patch 113724-09 or later)



Modification History


Date: 12-JAN-2006

12-Jan-2006:

  • Updated Contributing Factors and Relief/Workaround

Date: 25-APR-2006

25-Apr-2006:

  • Updated Contributing Factors and Resolution sections

Date: 15-JUN-2006

15-Jun-2006:

  • Updated Contributing Factors and Resolution sections



Attachments
This solution has no attachment

 
 
Login Required

You must login and have a valid contract to access Sun's Premium content which includes:

  • Sun Alerts
  • Bugs
  • Patches
  • Solutions
  • White Papers
  • Documentation
  • Support Knowledge

Login Required

You must login and have a valid contract to access Sun's contracted features

Access Legend:

(Login to access)   Sun Contracted Content
(Login to access)   Sun Contracted Feature

Please make use of SunSolve Feedback application by selecting the floating [+] to provide feedback about this specific document.

Search

Article Details
Article ID : 200893
Article Type : Sun Alert
Last reviewed : 2006-06-15
Audience : PUBLIC
Keywords :
Provide feedback  (help)
Page Tools
»  Print This Page
»  Email This Article
»  Bookmark This Article
 
Contact About Sun News & Events Employment Site Map Privacy Terms of Use Trademarks Copyright Sun Microsystems, Inc. | SunSolve Version 7.4.0 #1