Sun StorEdge 33x0/3510 Arrays May Report a Higher Incidence of Drive Failures With Firmware 4.1x SMART Feature Enabled |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun StorageTek 3310 SCSI Array Sun StorageTek 3510 FC Array Sun StorageTek 3320 SCSI Array
|
| Bug Id : | 6335242
|
| Date of Workaround Release : | 18-NOV-2005
|
| Date of Resolved Release : | 12-JAN-2006
|
Impact
When Self-Monitoring Analysis Reporting Technology (SMART) is enabled on existing (not new installation) Sun StorEdge 33x0/3510 arrays with firmware update 4.1x, a higher incidence of drive failure errors may be reported.
Contributing Factors
This issue can occur on the following platforms:
- Sun StorEdge 33x0/3510 Arrays with firmware 4.1x or later with SMART enabled
For 33x0 and 3510 arrays that have been running already for several months or more, an undetected state may exist on the drives in those arrays where the drives are still functional, but likely to report that they will fail in the near future, should SMART be enabled on them. Arrays already with one or more undetected marginal drive conditions present (and an "in-use" period of several months or more) are considered to be at increased risk from the issue described.
Symptoms
A greater than expected incidence of drive failures may be reported by this new feature of firmware 4.1x (or higher) with SMART enabled. Multiple drive (fatal) RAID failure occurrence may be reported within days after SMART is set to "Detect and Clone + Replace" which is the new 4.1x array controller firmware default.
SMART drive failure messages similar to the following will be recorded in the array event log :
Sun Sep 25 18:01:27 2005
[Primary] Warning
SMART-CH:2 ID:1 Predictable Failure Detected-Starting Clone
Sun Sep 25 18:01:27 2005
[Primary] Notification
LG:0 Logical Drive NOTICE:CHL:2 ID:1 Starting Clone
Workaround
In the case of an array which has already had a prolonged in-use life that may be susceptible to actual failure, initially enable SMART with the "Detect and Perpetual Clone" setting in order to allow a grace period in which you determine the need for drive maintenance.
For instructions on how to change this setting until drive maintenance can be scheduled, please see the following documentation at http://docs.sun.com/source/817-3711-17/ch09_scsidrives.html#pgfId-1021002
Or, if you wish to use the sccli interface to perform this operation, see http://docs.sun.com/source/817-4951-17/04_channel.html#pgfId-1015924 and look for "smart."
Please note that the Periodic Drive Check parameter must be enabled for SMART to be functional. Please monitor the array's event log closely for reports of drive SMART events throughout this period.
Note: Some best practices that should be followed to lessen the impact of this issue include:
1. Configure at least one "spare" disk drive in each array, so that the array can automatically start a reconstruction immediately after a "Drive Failure" event.
2. Replace failed drives with known good drives at the earliest opportunity.
Resolution
Please see the "Relief/Workaround" section.
Modification HistoryDate: 29-NOV-2005
29-Nov-2005:
- Updated Relief/Workaround (URL)
Date: 12-JAN-2006
12-Jan-2006:
- Updated Contributing Factors, Workaround, Resolution sections, re-release as Resolved
Date: 04-JAN-2008
- Updated URLs in Workaround section
AttachmentsThis solution has no attachment