Upgrading MAP3735FC (Allegro 8 Series) Drives to Revision 1201 Firmware May Result In Loss of Access to Volumes After Subsequent Reboot |
|
| Category : | AvailabilityData Loss |
| Release Phase : | Resolved |
| Product : | Sun StorageTek 3960 Sun StorageTek T3 Array Sun StorageTek T3+ Array Sun StorageTek 6910 Sun StorageTek 6960 Sun StorageTek 3910
|
| Bug Id : | 5077820
|
| Date of Workaround Release : | 12-AUG-2004
|
| Date of Resolved Release : | 31-Mar-2008
|
Upgrading the MAP3735FC (Allegro 8 Series) drives to revision 1201 firmware may cause a loss of access ... see below:
1. Impact
Upgrading the MAP3735FC (Allegro 8 Series) drives to revision 1201 firmware may cause a loss of access to data volumes after a subsequent reboot, and possibly result in a loss of data integrity on those volumes.
2. Contributing Factors
This issue can occur on the following platforms:
SPARC Platform
-
Sun StorEdge T3+ Array with MAP3735FC drives and firmware revision 1201
-
Sun StorEdge 3910, 3960, 6910, and 6960 Arrays with MAP3735FC drives and firmware revision 1201
Note 1: This Sun Alert only applies to those systems that have MAP3735FC drives and have been upgraded to 1201 drive firmware and have lost access to their volumes (unmounted and drives offline/disabled). If drives are in this state then follow the procedure in the "Relief/Workaround" section of this Sun Alert.
Note 2: If this is a new system with MAP3735FC 1201 drive firmware and the system is working fine (volumes mounted after reset OK) then ignore this SunAlert. This issue is only exhibited after a manual disk firmware upgrade and the first subsequent reset.
Note 3: If the MAP3735FC drive firmware has been manually upgraded to 1201 and volumes have already been recovered after a subsequent reset then ignore this SunAlert. This problem is only exhibited after a manual disk firmware upgrade and the first subsequent reset. This issue will not exhibit itself after the volumes have been recovered.
Note 4: If you are currently at drive firmware revision 1201 or later and have either performed the recovery steps in the "Relief/Workaround" section or have received your system with 1201 or later already installed from the factory, then you MUST stay at 1201. If you try to downgrade to 0801 then you WILL trigger the condition described in this Sun Alert. There are no operational data corruption issues with either 0801 or 1201 drive firmware.
Note 5: If you are currently at drive firmware revision 0801 then you MUST STAY at drive firmware revision 0801. DO NOT attempt to upgrade 1201 or you WILL trigger the condition described in this Sun Alert.
Note 6: Hot plugging drives (in response to drive failures) with either firmware revisions (0801, 1201) WILL NOT trigger the condition described in this Sun Alert.
Note 7: After a disk drive replacement (hotplug event) do not attempt to upgrade or downgrade the drive firmware on the replacement drive.
3. Symptoms
After applying the latest release of revision 1201 firmware (using the disk download utility) and a subsequent reboot, an indeterminate number of drives may go offline/disabled causing affected volumes to unmount, resulting in loss of access to those volumes and possible loss of data integrity. (An example of drives offline is shown below in the "Relief/Workaround section, steps 3-4).
4. Workaround
To work around the described issue, first document the volume and volume slicing configurations (volumes and volume slices must be recreated EXACTLY as they were before this workaround to insure data integrity).
The following example may not match a particular environment, therefore adjust this procedure to match the particular configuration (matching the particular command to the drives that are offline/disabled and volume recreation). The following example illustrates a T3+ "partner pair" running 3.1.4 and 1 volume per tray (8+1 RAID5 with drive 9 as a standby). Also in this example, the "disk download" procedure for firmware upgrade has already been done and a subsequent reset has already taken place.
1. Run the "vol list" command and note each volume configuration. This information will be used later to recreate the volumes EXACTLY as they existed before the "disk download" command was issued to the drives. An example:
hws27-44:/:<x> vol list
volume capacity raid data standby
v0 477.192 GB 5 u1d01-08 u1d09
v1 477.192 GB 5 u2d01-08 u2d09
...
2. Run the "sys list" command. If "enable_volslice = off", then you can skip to step 3. If "enable_volslice = on", then run the "volslice list" command and note each slice configuration. Also record this information to be used later to recreate slices EXACTLY as they existed before the "disk download" procedure for firmware upgrade was issued to the drives.
3. Run "fru list" from the command line of the T3+ master controller, as in the following example:
hws27-44:/:<x> fru list
ID TYPE VENDOR MODEL REVISION SERIAL
------ ----------------- ----------- ----------- ------------- --------
u1ctr controller card 0x301 5015710 50 101421
u2ctr controller card 0x301 5015710 50 101125
u1d01 disk drive - - - -
u1d02 disk drive - - - -
u1d03 disk drive <OEM Name> ST173404FSUN A42D 3CE07N5N
...
...
u1d09 disk drive <OEM Name> MAP3735F SUN 1201 P29001U9
u2d01 disk drive - - - -
u2d02 disk drive - - - -
...
u2d08 disk drive - - - -
u2d09 disk drive <OEM Name> MAP3735F SUN 1201 P290026J
...
u1l1 loop card 0x301 3750085 G 068718
u2mpn mid plane 0x301 3703990 E 013492
4. Note which drives are offline (as indicated by a "-" in the VENDOR, MODEL, REVISION, and SERIAL fields). In the above example, u1d01, u1d02, u2d01, u2d02, and u2d08 are offline. (Results may differ, depending on the configuration). At this point, physically remove and re-insert all the drives that are offline. After unplugging and plugging the drives it could take a considerable amount of time for them to spin up and for system areas to be updated. Perform a series of "fru stat" commands to verify that the drives are spun up and online, as in the following example:
hws27-44:/:<11> fru stat
ID TYPE VENDOR MODEL REVISION SERIAL
------ ----------------- ----------- ----------- ------------- --------
u1ctr controller card 0x301 5015710 50 101421
u2ctr controller card 0x301 5015710 50 101125
u1d01 disk drive <OEM Name> MAP3735F SUN 1201 P29001DB
u1d02 disk drive <OEM Name> MAP3735F SUN 1201 P2900057
u1d03 disk drive <OEM Name> ST173404FSUN A42D 3CE07N5N
...
u2d01 disk drive <OEM Name> MAP3735F SUN 1201 P29001UG
u2d02 disk drive <OEM Name> MAP3735F SUN 1201 P29001TT
...
5. Once all the drives are online and listed in the "fru stat" output, remove all volumes affected (this will remove the disabled state of any drive in the volume).To do this, run the following commands:
hws27-44:/:<x> vol remove v0
hws27-44:/:<x> vol remove v1
...
6. Now the volumes must be created EXACTLY as they were before. Failure to recreate volumes EXACTLY can cause loss of data integrity on those volumes. To do this, run the "vol add" command, as in the following example:
hws27-44:/:<x> vol add v0 data u1d1-8 raid 5 standby u1d9
hws27-44:/:<x> vol add v1 data u2d1-8 raid 5 standby u2d9
...
7. To initialze the volumes, gain command-line (CLI) access to the "dot" commands by issuing the appropriate command, followed by the password, as in this example.
hws27-44:/:<x> sun
hws27-44:/:<x> <password>
and answer "yes" when prompted, to these commands:
hws27-44:/:<x> .vol init v0 fast
hws27-44:/:<x> .vol init v1 fast
...
8. Now mount the volumes using the following commands:
hws27-44:/:<x> vol mount v0
hws27-44:/:<x> vol mount v1
...
If "volslicing" is NOT enabled ("enable_volslice = off" in step (2)), then you are DONE. If "volslicing" IS enabled ("enable_volslice = on" in step (2)), then recreate the slices EXACTLY as they were before, using the information gathered in steps 1 and 2. Failure to recreate the slices EXACTLY as they were may result in a loss of data integrity.
5. ResolutionThere are no further updates planned for this Sun Alert document. If
you need additional assistance regarding this issue, please contact Sun
Services.
This Sun Alert notification is being provided to you on an "AS IS"
basis. This Sun Alert notification may contain information provided by
third parties. The issues described in this Sun Alert notification may
or may not impact your system(s). Sun makes no representations,
warranties, or guarantees as to the information contained herein. ANY
AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR
NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU
ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT,
INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT
OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This
Sun Alert notification contains Sun proprietary and confidential
information. It is being provided to you pursuant to the provisions of
your agreement to purchase services from Sun, or, if you do not have
such an agreement, the Sun.com Terms of Use. This Sun Alert
notification may only be used for the purposes contemplated by these
agreements.
Copyright 2000-2008 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.Modification History22-SEP-2004: Additional "notes" 4, 5, 6, and 7 added to "Contributing Factors" section
31-Mar-2008: no further updates. Resolved.
AttachmentsThis solution has no attachment