Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array |
|
| Category : | Data Loss |
| Release Phase : | Resolved |
| Product : | Sun StorageTek 3310 SCSI Array Sun StorageTek 3320 SCSI Array
|
| Bug Id : | 6363490, 6378796
|
| Date of Workaround Release : | 12-JAN-2006
|
| Date of Resolved Release : | 13-Mar-2008
|
Data Inconsistencies May Occur When Persistent SCSI Parity Errors are Generated Between the Host and the SE33x0 Array
1. Impact
When the connection between the SE33x0 array and the host has degraded to the point that WRITE requests cannot be completed due to connectivity issues, persistent SCSI parity errors may be generated between the host and the SE33X0 array and data inconsistencies may occur.
2. Contributing Factors
This issue can occur on the following platforms:
- Sun StorEdge 3310 SCSI array without firmware 4.15F (as delivered in patch 113722-15)
- Sun StorEdge 3320 SCSI array without firmware 4.15G (as delivered in patch 113730-01)
SCSI parity errors can cause invalid data to get written into the array's cache. Prior to firmware version 4.15, this data eventually gets flushed to the disk media, permanently storing this invalid data on the volume. Firmware version 4.15 was modified to discard this corrupted data rather than write it to disk media. This reduces the probability of corrupting the volume. However, in the rare case where the write command overlapped a prior write command's data that still resided in cache, that data will also be discarded.
Single Path Configurations
Configurations in which a host has only one path to one or more logical units on the array are exposed to this problem. This is because there is no redundant path between the host and the SE33x0 array. This lack of redundancy does not allow for a retry using a second path to the SE33x0 array.
When using firmware version 4.15 in this configuration, if any write commands failed due to parity errors, there is a possibility of lost write data in cache if the application or file system issued writes to overlapping LBAs.
When using older firmware in this configuration, the data for LBAs of any WRITE request that cannot be completed as a result of a PARITY ERROR returned by the SE33x0 should be considered to have invalid data.
Multi Path/High Availability Configurations
The exposure for a properly configured High Availability configuration using a host multi-pathing driver and and multiple separate connections between the host(s) and the SE33x0 array is very small. In this configuration, the multi-pathing driver in the host will utilize the second, non-compromised path to the array controller to retry the WRITE request. A successful retry will successfully write the intended data to the correct LBAs with the following exceptions:
1. If the SE33x0 array or the host experiences a power failure between the failed WRITE request and the successful completion of the retry down the second path, the data for the failed WRITE request should be considered invalid.
2. If the Host OS experiences a crash or a multi-path driver error between the failed WRITE request and the successful completion of the retry down the second path, the data for the failed WRITE request should be considered invalid.
3. Symptoms
Should the described issue occur, persistent SCSI parity errors between the host and the SE33x0 array will be generated. The SE33x0 array will return a SCSI status of "Parity Error" to the host SCSI Host Bus Adapter (HBA). Typically, the host SCSI HBA will retry the WRITE request some number of times (most drivers attempt between 2 to 6 retries) before returning the WRITE request to the application with a FAILURE status.
4. Workaround
There is no workaround for this issue. Please see the resolution section below.
5. Resolution
The issue described in BugID 6363490 is addressed on the following platforms:
- Sun StorEdge 3310 SCSI array with firmware revision 4.15F (as delivered in patch 113722-15 or later)
- Sun StorEdge 3320 SCSI array with firmware revision 4.15G (as delivered in patch 113730-01 or later)
Note: Insure that SCSI connections are reliable and properly configured to minimize the probability of parity errors and use multiple SCSI connections with failover drivers.
Because the nature of the changes would require a major redesign, the issue described in BugID 6378796 was closed as "will not fix."
This Sun Alert notification is being provided to you on
an "AS IS"
basis. This Sun Alert notification may contain information provided by
third parties. The issues described in this Sun Alert notification may
or may not impact your system(s). Sun makes no representations,
warranties, or guarantees as to the information contained herein. ANY
AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU
ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT
OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This
Sun Alert notification contains Sun proprietary and confidential
information. It is being provided to you pursuant to the provisions of
your agreement to purchase services from Sun, or, if you do not have
such an agreement, the Sun.com Terms of Use. This Sun Alert
notification may only be used for the purposes contemplated by these
agreements.
Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa
Clara, CA 95054 U.S.A. All rights reserved.
Modification History14-Jun-2006: Updated Contributing Factors and Resolution Sections
13-Mar-2008: Updated Resolution section - RESOLVED
AttachmentsThis solution has no attachment