Failed Controller Condition May Cause Data Integrity Issues |
|
| Category : | Data Loss |
| Release Phase : | Resolved |
| Product : | Sun StorageTek 3310 SCSI Array Sun StorageTek 3510 FC Array Sun StorageTek 3320 SCSI Array Sun StorageTek 3511 SATA Array
|
| Bug Id : | 6355818
|
| Date of Workaround Release : | 13-DEC-2005
|
| Date of Resolved Release : | 15-JUN-2006
|
Impact
On a Sun StorEdge 33x0/35xx array, when a failed RAID controller condition exists and the array is power cycled, data integrity issues may occur.
Contributing Factors
This issue can occur on the following platforms:
- Sun StorEdge 3310 SCSI array without firmware 4.15F (as delivered in patch 113722-15)
- Sun StorEdge 3320 SCSI array without firmware 4.15G (as delivered in patch 113730-01)
- Sun StorEdge 3510 FC array without firmware 4.15F (as delivered in patch 113723-15)
- Sun StorEdge 3511 FC/SATA array without firmware 4.15F (as delivered in patch 113724-09)
Note: This issue can occur with all current firmware revisions available for the Sun StorEdge 33x0/35xx arrays.
This issue can occur when a default primary RAID controller failure condition exists and the array is power cycled during that time, resulting in stale cache data (contained in the failed controller) being written unexpectedly to disk.
Note: The default primary RAID controller is the controller with the higher serial number. This can be determined via the CLI by using the sccli syntax "sccli> show redundancy," and is not the serial number on the back of the FRU.
Symptoms
Upon power cycling the array, the failed controller comes online and the existing filesystems on the array report fsck(1M) or other data integrity issues.
Workaround
The failed controller's cache needs to be discarded or the failed controller must be removed from the array prior to resetting or power cycling the array.
Note: Always replace the failed controller with the power on (the array can be power cycled with just one controller).
Scenario 1 - Spare controller available:
With the power on and the array operational on one controller, install the replacement controller for the failed controller. (Removing the failed controller with power on before the array is power cycled will not allow any stale data from the failed controller to be written out).
Scenario 2 - Spare controller unavailable:
Option 1: Assumption is you have sccli in-band or out-of-band access to the array.
Unfail the controller using sccli syntax "sccli> unfail." The failed controller's cache will be discarded when the controller is put back on-line as a Secondary controller. If this command fails, follow Option 2.
Option 2: Prior to resetting or power cycling the array, remove the failed controller, and then remove the battery module for at least 5 seconds on the failed controller and reinsert battery to invalidate the cache on the failed controller. To maintain proper air flow, partially reinsert controller until it is 1 inch from full reseating location.
Please refer to Sun documentation at: http://docs.sun.com/app/docs/doc/816-7326-20
Please refer to the corresponding documents for the required firmware levels to identify failed controllers:
"Sun StorEdge 3000 Family Installation, Operation, and Service Manual" collection at http://www.sun.com/products-n-solutions/hardware/docs/Network_Storage_Solutions/Workgroup/index.html
and the "Sun StorEdge 3000 Family CLI User's Guide" at http://www.sun.com/products-n-solutions/hardware/docs/html/817-4951-14
Resolution
This issue is addressed on the following platforms:
- Sun StorEdge 3310 SCSI array with firmware 4.15F (as delivered in patch 113722-15 or later)
- Sun StorEdge 3320 SCSI array with firmware 4.15G (as delivered in patch 113730-01 or later)
- Sun StorEdge 3510 FC array with firmware 4.15F (as delivered in patch 113723-15 or later)
- Sun StorEdge 3511 FC/SATA array with firmware 4.15F (as delivered in patch 113724-09 or later)
Modification HistoryDate: 12-JAN-2006
12-Jan-2006:
- Updated Contributing Factors and Relief/Workaround
Date: 25-APR-2006
25-Apr-2006:
- Updated Contributing Factors and Resolution sections
Date: 15-JUN-2006
15-Jun-2006:
- Updated Contributing Factors and Resolution sections
AttachmentsThis solution has no attachment