Brocade Switches May Panic With "Kernel Access of Bad Area" Error |
|
| Category : | Availability |
| Release Phase : | Resolved |
| Bug Id : | Brocade ID 000075449
|
| Product : | Brocade Switches
|
| Date of Resolved Release : | 16-MAY-2007
|
Impact
A Linux kernel issue (a kernel process is spawned by a non-root user) may cause certain Brocade switches to panic and reboot, causing temporary loss of connectivity to SAN devices.
Contributing Factors
This issue can occur on the following platforms:
- SG-XSWBRO200E 8P SilkWorm 200E switch
- SG-XSWBRO3250 SilkWorm 3250 switch
- SG-XSWBRO3850 SilkWorm 3850 switch
- SG-XSWBRO3900 SilkWorm 3900 switch
- SG-XSWBRO4100 SilkWorm 4100 switch
- SG-XSWBRO4900 SilkWorm 4900 switch
- SG-XSWBRO24K-32P SilkWorm 24000 Director
- SG-XSWBRO48ZP Silkworm 48000 Director
without FOS 5.2.0b or later (5.2.1b is delivered in patch 124898-03)
and:
- SG-XSWBRO12000-32P/64P SilkWorm 12000 Director
without FOS 5.0.5c (as delivered in patch 119552-05)
Symptoms
Should the described issue occur, the switch will panic and reboot. Details of the activity will be logged in the switch internal logs and can be viewed after the reboot.
"errdump" or "errshow" will show entries indicating that the switch rebooted AND that a panic dump occured:
SWITCH-DATE-AND-TIME, [PDTR-1001], 26,, INFO, ?, pdcheck: info: found new pd i0
SWITCH-DATE-AND-TIME, [MFIC-1002], 26,, INFO, ?, Chassis FRU header not program.
SWITCH-DATE-AND-TIME, [HAM-1004], 27,, INFO, ?, Switch reboot, reason: Unknown
In a "pdshow" output collection, "kernel access of bad area" will be contained in the console log, similar to the following extractor/abbreviation:
Oops: kernel access of bad area, sig: 11^M NIP: C0032308 XER: 20000000 LR:
C00322C0 SP: C3BF7EA0 REGS: c3bf7de0 TRAP: 0800 Not tainted MSR: 00021030 EE:
0 PR: 0 FP: 0 ME: 1 IR/DR: 11 TASK = c3bf6000[933] 'fwd1' Last syscall: 114
last math 00000000 last altivec 00000000 PLB0: bear= 0x25298018 acr= 0x00000000
besr= 0x00000000 PLB0 to OPB: bear= 0x10008081 besr0= 0x00000000 besr1= 0x00000000
GPR00: 00000000 C3BF7EA0 C3BF6000 00000005 00029030 00000000 C3BF7F04 00000020
GPR08: C03248EC D8365470 00000000 00000020 30026978 106487F0 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00029032 03BF7F30 00000000 00000001
GPR24: 106B7EA8 00000001 C0201FF0 00000000 FFFFFFFF 060102E1 000068B5 C0BC1440
Call backtrace: 00000000 C0020FA0 C00199B4 C001ABE8 C0004BBC 00000004 0F484064
0F483444 0F4197A0 >>NIP; c0032308 <kmem_cache_free+0x6c/0x104 [kernel]> <=====
Trace; 00000000 <Unknown (0x0)>
Trace; c0020fa0 <free_uid+0x54/0x64 [kernel]>
Trace; c00199b4 <release_task+0x40/0x198 [kernel]>
Trace; c001abe8 <sys_wait4+0x318/0x39c [kernel]>
Trace; c0004bbc <ret_from_syscall_1+0x0/0xb4 [kernel]>
Final confirmation that the condition has occurred is with the following entry:
Trace; c0020fa0 <free_uid+0x54/0x64 [kernel]>
Workaround
There is no workaround for this issue. Please see the Resolution section below.
Resolution
This issue is addressed on the following platforms:
- SG-XSWBRO200E 8P SilkWorm 200E switch
- SG-XSWBRO3250 SilkWorm 3250 switch
- SG-XSWBRO3850 SilkWorm 3850 switch
- SG-XSWBRO3900 SilkWorm 3900 switch
- SG-XSWBRO4100 SilkWorm 4100 switch
- SG-XSWBRO4900 SilkWorm 4900 switch
- SG-XSWBRO24K-32P SilkWorm 24000 Director
- SG-XSWBRO48ZP Silkworm 48000 Director
with FOS 5.2.0b (5.2.1b is delivered in patch 124898-03 )
and:
- SG-XSWBRO12000-32P/64P SilkWorm 12000 Director
with FOS 5.0.5c (as delivered in patch 119552-05 or later)
Modification HistoryDate: 03-JUL-2007
- Updated Contributing Factors and Resolution sections
AttachmentsThis solution has no attachment