Hi everyone.
On friday we had a SAN failure. Logged a job with EMC and they had no clue as nothing was obvious.
They did notice that both storage processors had a panic at the same time. The EMC development engineers are looking into it.Needless to say that pretty much everything turned to custard. A lot of VMs are unhappy and pretty much needed a cold reboot.
On one of our hosts I have lost one of the LUNs. On the SAN there are two LUNs available. ESX seems to detect the same LUN twice and obviously I have issues with my paths.
Some output that is related
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a2 vmhba32:2:0 On active preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b2 vmhba32:6:0 On
Disk vmhba0:0:0 /dev/cciss/c0d0 (69973MB) has 1 paths and policy of Fixed
Local 6:0.0 vmhba0:0:0 On active preferred
Disk vmhba32:3:1 /dev/sdb (512000MB) has 2 paths and policy of Most Recently Used
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a3 vmhba32:3:1 Standby preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b3 vmhba32:7:1 On active
Disk vmhba32:3:3 (512000MB) has 2 paths and policy of Most Recently Used
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a3 vmhba32:3:3 Dead preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b3 vmhba32:7:3 Dead
Disk vmhba32:3:0 /dev/sda (512000MB) has 2 paths and policy of Most Recently Used
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a3 vmhba32:3:0 Standby preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b3 vmhba32:7:0 On active
vmhba32:3:0 /dev/sda
vmhba32:3:1 /dev/sdb
These two seem to be the same physical LUN though.
Some log data
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.003 cpu5:1043)WARNING: SCSI: 4541: Delaying failover to path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.004 cpu1:1025)SCSI: 5270: vml.020003000060060160a2a01a007eea3c745c6edd11
524149442035: Cmd failed. Blocking device during path failover.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.006 cpu2:1058)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.006 cpu2:1058)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.006 cpu2:1058)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:3:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)WARNING: SCSI: 4559: Manual switchover to path vmhba32:7:3
begins.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)WARNING: SCSI: 3743: Could not switchover to vmhba32:7:3.
Check Unit Ready Command returned an error instead of NOT READY for standby controller .
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)WARNING: SCSI: 4619: Manual switchover to vmhba32:7:3 comp
leted unsuccessfully.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:3:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)WARNING: SCSI: 4559: Manual switchover to path vmhba32:3:3
begins.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)iSCSI: session 0xba402c0 eh_device_reset at 1589761539 for
command 0x6636888 to (0 0 3 3), cdb 0x0
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu2:1081)iSCSI: session 0xba402c0 requested target reset for (0 0 3
*), warm reset itt 25080319 at 1589761539
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.016 cpu6:1082)iSCSI: session 0xba402c0 warm target reset success for mgm
t 25080319 at 1589761539
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.017 cpu2:1081)iSCSI: session 0xba402c0 (0 0 3 *) finished reset at 15897
Some dmesg output
VMWARE: Device that would have been attached as scsi disk sda at scsi1, channel 0, id 2, lun 0
Has not been attached because this path is not active.
key = 0x2, asc = 0x4, ascq = 0x1
VMWARE: Device that would have been attached as scsi disk sda at scsi1, channel 0, id 2, lun 0
Has not been attached because it is a duplicate path or on a passive path
I have never dealt with such an issue so any pointers would be appreciated.
Cheers
On friday we had a SAN failure. Logged a job with EMC and they had no clue as nothing was obvious.
They did notice that both storage processors had a panic at the same time. The EMC development engineers are looking into it.Needless to say that pretty much everything turned to custard. A lot of VMs are unhappy and pretty much needed a cold reboot.
On one of our hosts I have lost one of the LUNs. On the SAN there are two LUNs available. ESX seems to detect the same LUN twice and obviously I have issues with my paths.
Some output that is related
- esxcfg-mpath -l
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a2 vmhba32:2:0 On active preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b2 vmhba32:6:0 On
Disk vmhba0:0:0 /dev/cciss/c0d0 (69973MB) has 1 paths and policy of Fixed
Local 6:0.0 vmhba0:0:0 On active preferred
Disk vmhba32:3:1 /dev/sdb (512000MB) has 2 paths and policy of Most Recently Used
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a3 vmhba32:3:1 Standby preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b3 vmhba32:7:1 On active
Disk vmhba32:3:3 (512000MB) has 2 paths and policy of Most Recently Used
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a3 vmhba32:3:3 Dead preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b3 vmhba32:7:3 Dead
Disk vmhba32:3:0 /dev/sda (512000MB) has 2 paths and policy of Most Recently Used
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.a3 vmhba32:3:0 Standby preferred
iScsi sw iqn.1998-01.com.vmware:localhost-3055e03e<->iqn.1992-04.com.emc:cx.ck200064601253.b3 vmhba32:7:0 On active
- esxcfg-vmhbadevs
vmhba32:3:0 /dev/sda
vmhba32:3:1 /dev/sdb
These two seem to be the same physical LUN though.
Some log data
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.003 cpu5:1043)WARNING: SCSI: 4541: Delaying failover to path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.004 cpu1:1025)SCSI: 5270: vml.020003000060060160a2a01a007eea3c745c6edd11
524149442035: Cmd failed. Blocking device during path failover.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.006 cpu2:1058)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.006 cpu2:1058)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.006 cpu2:1058)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:3:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)WARNING: SCSI: 4559: Manual switchover to path vmhba32:7:3
begins.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)WARNING: SCSI: 3743: Could not switchover to vmhba32:7:3.
Check Unit Ready Command returned an error instead of NOT READY for standby controller .
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.007 cpu2:1058)WARNING: SCSI: 4619: Manual switchover to vmhba32:7:3 comp
leted unsuccessfully.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2741: Could not locate path to peer SP for CX SP B p
ath vmhba32:7:3.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:7:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)SCSI: 2308: Unmapped LUN state for DGC path vmhba32:3:3
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)WARNING: SCSI: 4559: Manual switchover to path vmhba32:3:3
begins.
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu5:1057)iSCSI: session 0xba402c0 eh_device_reset at 1589761539 for
command 0x6636888 to (0 0 3 3), cdb 0x0
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.010 cpu2:1081)iSCSI: session 0xba402c0 requested target reset for (0 0 3
*), warm reset itt 25080319 at 1589761539
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.016 cpu6:1082)iSCSI: session 0xba402c0 warm target reset success for mgm
t 25080319 at 1589761539
Feb 21 14:01:03 tur-esx-dev1 vmkernel: 184:00:00:15.017 cpu2:1081)iSCSI: session 0xba402c0 (0 0 3 *) finished reset at 15897
Some dmesg output
VMWARE: Device that would have been attached as scsi disk sda at scsi1, channel 0, id 2, lun 0
Has not been attached because this path is not active.
key = 0x2, asc = 0x4, ascq = 0x1
VMWARE: Device that would have been attached as scsi disk sda at scsi1, channel 0, id 2, lun 0
Has not been attached because it is a duplicate path or on a passive path
I have never dealt with such an issue so any pointers would be appreciated.
Cheers