Certain sscs(1M) Commands, Array/StorEdge 3900SL CLI Commands, or Certain StorEdge 3900SL/6320/6130 GUI Actions May Cause Loss of Connectivity to a Host(s) |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Product : | Sun StorageTek 3900 Series Sun StorageTek T3 Array Sun StorageTek T3+ Array Sun StorageTek 6120 Array Sun StorageTek 6320 System Sun StorageTek 6130 Array
|
| Bug Id : | 6197128, 6202414
|
| Date of Workaround Release : | 14-JAN-2005
|
| Date of Resolved Release : | 01-JUN-2005
|
Impact
Under rare conditions, the use of certain sscs(1M) commands, array/StorEdge 3900SL CLI commands, or certain StorEdge 3900SL/6320/6130 GUI actions to manage a Sun StorEdge 3900SL/6120/6130/6320/T3+ Array, attached via certain Fibre Channel (FC) switches (listed below), on certain switch firmware releases (listed below), and with Host Bus Adapters (HBA) using the Sun QLC HBA driver, may cause loss of connectivity to a host(s). As a result, it is possible the use of these commands can cause a path failure, which could lead to a complete loss of host access to the array.
Contributing Factors
This issue can occur in the following platforms:
SPARC Platform
- Sun StorEdge 3900SL Array
- Sun StorEdge 6120/6130/6320 Arrays
- Sun StorEdge T3+ Array
connected to the following switch models:
- SG-XSWBRO3200 - 3200 switch with 8 ports with FabOS 3.1.2a (as delivered with firmware patch 115360-03) and without patch 115360-05
- SG-XSWBRO3200 - 3200 switch with 8 ports with FabOS 3.1.3 (as delivered with firmware patch 115360-04) and without patch 115360-05
- SG-XSWBRO3800 - 3800 switch with 16 ports with FabOS 3.1.2a (as delivered with firmware patch 115360-03) and without patch 115360-05
- SG-XSWBRO3800 - 3800 switch with 16 ports with FabOS 3.1.3 (as delivered with firmware patch 115360-04) and without patch 115360-05
The described issue may occur in the configurations described above when the following sscs(1M) commands, or array/StorEdge 3900SL CLI commands are issued:
sscs(1M) commands:
- sscs modify volgroup
- sscs create volume
- sscs create initiator
- sscs create pool
- sscs modify array
- sscs add initgroup
- sscs map
StorEdge 6120/T3+ telnet(1) commands:
- lun perm
- hwwn
- volslice
- vol mount
- sys mp_support
StorEdge 3900SL Service Processor (SP) CLI commands:
The following menu options in the program "/opt/SUNWsecfg/runsecfg" :
- 3) Configure Sun StorEdge T3+ Array(s)
- 6) Modify Sun StorEdge T3+ Array Sys Parameters
- 8) Manage Sun StorEdge T3+ Array LUN Slicing
- 9) Manage Sun StorEdge T3+ Array LUN Masking
The following commands from the directory "/opt/SUNWsecfg/bin" on the Service Processor (SP):
- createt3group
- addtot3group
- delfromt3group
- rmt3group
- createt3slice
- rmt3slice
- modifyt3config
- savet3config
- modifyt3params
- sett3lunperm
Notes:
1. StorEdge 3900SL/6130/6320 GUI actions equivalent to these commands may also cause the issue to occur.
2. The following Read-Only commands will not trigger the described issue:
- lun perm list
- hwwn list
- hwwn listgrp
- volslice list
3. The described issue may be encountered only under the above mentioned conditions.
Symptoms
If the described issue occurs, on hosts running Sun "fp" and "mpxio" drivers, "PLOGI timeout" messages and host messages from STMS reporting that LUNs are being offlined, and that the paths allowing access to those LUNs are now degraded due to the loss of one path, will be displayed in the array syslog:
Example message with FabOS 3.1.2a:
[date time hostname] fp: [ID 517869 kern.info] NOTICE: fp(1): PLOGI to 10f00 failed state=Timeout,
reason=Hardware Error...
[date time hostname] PLOGI to D_ID=0x10f00 failed: State:Timeout, Reason:Hardware Error. Giving up
[date time hostname] scsi: [ID 243001 kern.info] /pci@1d,700000/SUNW,qlc@1,1/fp@0,0 (fcp1):
[date time hostname] offlining lun=b (trace=0), target=10f00 (trace=2800101)
[date time hostname] mpxio: [ID 669396 kern.info] /scsi_vhci/ssd@g60003ba4e7fbe00041862b4700047ffc (ssd1)
multipath status: degraded, path /pci@1d,700000/SUNW,qlc@1,1/fp@0,0 (fp1) to target address: 20030003ba4e7fbe,b
is offline. Load balancing: round-robin
Note: The above are examples only. On each system, the LUN numbers, target numbers, and device paths will vary. To identify that this issue is being seen, check the target trace value ("trace=2800101" above) and the overall sequence of events, where many LUNs failover, and a path is reported to be "offline", after performing any of the commands shown in section 2.
Alternatively, on a switch running FabOS 3.1.3, the switch port to the host and storage may become unresponsive. This is identified by issuing a "cfgadm -al -o show_FCP_dev" command, where the outputs via this path may be marked with a Condition of "failing" or the controller itself may revert to a Type of "fc". The switch CLI will also show symptoms; if a "switchshow" is performed, the command will not return and will "freeze" on the port that was connected to the storage being manipulated.
Example message with FabOS 3.1.3:
Ap_Id Type Receptacle Occupant Condition
c3 fc-fabric connected configured unknown
c3::20030003ba4e7fbe,0 disk connected configured failing
c3::20030003ba4e7fbe,1 disk connected configured failing
c3::20030003ba4e7fbe,2 disk connected configured failing
c3::20030003ba4e7fbe,3 disk connected configured failing
c3::20030003ba4e7fbe,4 disk connected configured failing
c3::20030003ba4e7fbe,5 disk connected configured failing
c3::20030003ba4e7fbe,6 disk connected configured failing
c3::20030003ba4e7fbe,7 disk connected configured failing
c3::20030003ba4e7fbe,8 disk connected configured failing
c3::20030003ba4e7fbe,9 disk connected configured failing
c4 fc-fabric connected configured unknown
c4::20030003ba4e8ad1,0 disk connected configured unknown
c4::20030003ba4e8ad1,1 disk connected configured unknown
c4::20030003ba4e8ad1,2 disk connected configured unknown
c4::20030003ba4e8ad1,3 disk connected configured unknown
c4::20030003ba4e8ad1,4 disk connected configured unknown
c4::20030003ba4e8ad1,5 disk connected configured unknown
c4::20030003ba4e8ad1,6 disk connected configured unknown
c4::20030003ba4e8ad1,7 disk connected configured unknown
c4::20030003ba4e8ad1,8 disk connected configured unknown
c4::20030003ba4e8ad1,9 disk connected configured unknown
Example Message with 3.1.3 CLI
Switch:admin> switchshow
switchName: Switch
switchType: 9.2
switchState: Online
switchMode: Native
switchRole: Principal
switchDomain: 1
switchId: fffc01
switchWwn: 10:00:00:60:69:51:8a:a3
switchBeacon: OFF
Zoning: ON (example)
port 0: id N1 Online F-Port 21:00:00:e0:8b:0c:12:15
port 1: id N1 Online F-Port 21:00:00:e0:8b:0c:6e:16
port 2: id AN No_Sync
port 3: -- N2 No_Module
port 4: id N2 Online F-Port 21:01:00:e0:8b:27:81:b4
port 5: -- N2 No_Module
port 6: id 1G Online F-Port 50:02:0f:23:00:00:06:f2
port 7: id N1 Online F-Port 50:02:0f:23:00:01:08:bb
port 8: id N2 No_Light
port 9: id N2 No_Light
port 10: id AN No_Sync
port 11: id AN No_Sync
port 12: id N2 Online F-Port 21:01:00:e0:8b:37:6b:12
port 13: id N2 Online F-Port 21:00:00:e0:8b:17:3b:14
port 14: id N2 Online F-Port 21:00:00:e0:8b:17:6b:18
**Note that the command did not return to CLI prompt and that port 15 (attached to storage being manipulated) is missing **
Workaround
If the above issue occurs, wait for LUN failovers to complete and follow the recommendations shown below:
On the hosts(s) where the above STMS "offlining lun" and "multipath status: degraded" messages were seen, run the following luxadm(1M) command as root:
# luxadm -e port
Found path to 2 HBA ports
/devices/pci@1d,700000/SUNW,qlc@1/fp@0,0:devctl CONNECTED
/devices/pci@1d,700000/SUNW,qlc@1,1/fp@0,0:devctl NOT CONNECTED
To reconnect the path, issue the following "luxadm -e forcelip" command for the path that was shown in the STMS error message reporting as "multipath status: degraded".
In the example above, the error occurred on "/pci@1d,700000/SUNW,qlc@1,1/fp@0,0" so the following command is used:
# luxadm -e forcelip /devices/pci@1d,700000/SUNW,qlc@1,1/fp@0,0:devctl
After running "luxadm -e forcelip" on the path(s) required above, you can confirm that all paths are now usable by running "luxadm -e port" again as shown below:
# luxadm -e port
Found path to 2 HBA ports
/devices/pci@1d,700000/SUNW,qlc@1/fp@0,0:devctl CONNECTED
/devices/pci@1d,700000/SUNW,qlc@1,1/fp@0,0:devctl CONNECTED
With FabOS version 3.1.3, if the switch port has become unresponsive then the switch will require a reboot to restore connectivity on this path. This action will potentially affect the connectivity of other hosts to the storage so it is imperative to ensure that the switch has indeed become unresponsive and that any other host on this switch has an alternative path to the storage, prior to resetting the affected switch.
Resolution
This issue is addressed in the following releases:
SPARC Platform
- Sun StorEdge 3900SL Array
- Sun StorEdge 6120/6130/6320 Arrays
- Sun StorEdge T3+ Array
connected to the following switch models:
- SG-XSWBRO3200 - 3200 switch with 8 ports with patch 115360-05
- SG-XSWBRO3800 - 3800 switch with with 16 ports with patch 115360-05
Modification HistoryDate: 01-JUN-2005
- State: Resolved
- Updated Contributing Factors and Relief/Workaround sections
Date: 15-JUL-2005
- Updated the Contributing Factors and the Resolution sections
AttachmentsThis solution has no attachment