VMware Cloud Community
mbent
Contributor
Contributor
Jump to solution

Help on path thrashing

Hi Gurus!

I need a helping hand with a path thrasing problem.

Data: 2 Dell Poweredge R900 with esx 3.5 Update2 connected to EMC Clariion CX4 with 2 Qlogic QLE2460 firmware:4.00.29 on each host.

Output:

# esxcfg-mpath -l -v

Disk vmhba1:0:0 vml.020000000050060160b9a00cdf50060160b9a00cdf4c554e5a2020 (0MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->ffffffffffffffff:ffffffffffffffff vmhba1:0:0 Dead preferred

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->ffffffffffffffff:ffffffffffffffff vmhba2:0:0 Dead

Disk vmhba2:1:4 vml.020004000060060160c0c92200d6ed69adb93fde11524149442035 /dev/sdf (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:4 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:4 On active

Disk vmhba2:1:8 vml.020008000060060160c0c922007c4f8a11ba3fde11524149442035 /dev/sdj (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:8 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:8 On active

Disk vmhba2:1:9 vml.020009000060060160c0c922003af75a26ba3fde11524149442035 /dev/sdk (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:9 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:9 On active

Disk vmhba2:1:2 vml.020002000060060160c0c92200ce25a88ab93fde11524149442035 /dev/sdd (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:2 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:2 On active

Disk vmhba2:1:1 vml.020001000060060160c0c92200e071e07bb93fde11524149442035 /dev/sdb (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:1 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:1 On active

Disk vmhba2:1:7 vml.020007000060060160c0c92200d088acf8b93fde11524149442035 /dev/sdi (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:7 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:7 On active

Disk vmhba2:1:5 vml.020005000060060160c0c922007acabfbdb93fde11524149442035 /dev/sdg (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:5 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:5 On active

Disk vmhba3:0:0 vml.02000000006001ec90e1d59c00102db6ed04ce6c7b504552432036 /dev/sdc (69376MB) has 1 paths and policy of Fixed

Local 28:0.0 vmhba3:0:0 On active preferred

Disk vmhba2:1:3 vml.020003000060060160c0c92200e6024a9eb93fde11524149442035 /dev/sde (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:3 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:3 On active

Disk vmhba2:1:6 vml.020006000060060160c0c92200968e7eceb93fde11524149442035 /dev/sdh (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:6 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:6 On active

Disk vmhba2:1:0 vml.020000000060060160c0c9220064ce5427b93fde11524149442035 /dev/sda (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321bfe70:2000001b321bfe70<->500601603b2013f3:50060160bb2013f3 vmhba2:1:0 Standby preferred

FC 15:0.0 2100001b321bd06c:2000001b321bd06c<->500601693b2013f3:50060160bb2013f3 vmhba1:1:0 On active

=============================================================================

# esxcfg-mpath -l -v

Disk vmhba1:0:0 vml.020000000050060160b9a00cdf50060160b9a00cdf4c554e5a2020 (0MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->5006016239a00cdf:50060160b9a00cdf vmhba1:0:0 Standby preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->5006016939a00cdf:50060160b9a00cdf vmhba2:0:0 Dead

Disk vmhba1:1:4 vml.020004000060060160c0c92200d6ed69adb93fde11524149442035 /dev/sdf (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:4 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:4 Standby

Disk vmhba1:1:8 vml.020008000060060160c0c922007c4f8a11ba3fde11524149442035 /dev/sdj (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:8 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:8 Standby

Disk vmhba1:1:9 vml.020009000060060160c0c922003af75a26ba3fde11524149442035 /dev/sdk (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:9 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:9 Standby

Disk vmhba1:1:2 vml.020002000060060160c0c92200ce25a88ab93fde11524149442035 /dev/sdd (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:2 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:2 Standby

Disk vmhba1:1:1 vml.020001000060060160c0c92200e071e07bb93fde11524149442035 /dev/sdb (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:1 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:1 Standby

Disk vmhba1:1:7 vml.020007000060060160c0c92200d088acf8b93fde11524149442035 /dev/sdi (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:7 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:7 Standby

Disk vmhba1:1:5 vml.020005000060060160c0c922007acabfbdb93fde11524149442035 /dev/sdg (409600MB) has 2 paths and policy of Most Recently Used

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:5 Standby

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:5 On active preferred

Disk vmhba1:1:3 vml.020003000060060160c0c92200e6024a9eb93fde11524149442035 /dev/sde (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:3 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:3 Standby

Disk vmhba1:1:6 vml.020006000060060160c0c92200968e7eceb93fde11524149442035 /dev/sdh (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:6 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:6 Standby

Disk vmhba3:0:0 vml.02000000006001ec90e1d25200102ca99f04ab5afc504552432036 /dev/sdc (69376MB) has 1 paths and policy of Fixed

Local 28:0.0 vmhba3:0:0 On active preferred

Disk vmhba1:1:0 vml.020000000060060160c0c9220064ce5427b93fde11524149442035 /dev/sda (409600MB) has 2 paths and policy of Most Recently Used

FC 15:0.0 2100001b321b8533:2000001b321b8533<->500601693b2013f3:50060160bb2013f3 vmhba1:1:0 On active preferred

FC 27:0.0 2100001b321b7d33:2000001b321b7d33<->500601603b2013f3:50060160bb2013f3 vmhba2:1:0 Standby

=========================================================================================================

Jun 3 12:01:02 087 vmkernel: 28:12:58:19.050 cpu9:1063)WARNING: SCSI: 2896: CheckUnitReady on vmh

ba1:0:0 returned Storage initiator error 0x0/0x0 sk 0x0 asc 0x0 ascq 0x0

Jun 3 12:01:02 087 vmkernel: 28:12:58:19.050 cpu9:1063)WARNING: SCSI: 2896: CheckUnitReady on vmh

ba2:0:0 returned Storage initiator error 0x0/0x0 sk 0x0 asc 0x0 ascq 0x0

Jun 3 12:01:02 087 vmkernel: 28:12:58:19.050 cpu9:1063)WARNING: SCSI: 4559: Manual switchover to

path vmhba1:0:0 begins.

Jun 3 12:01:02 087 vmkernel: 28:12:58:19.050 cpu9:1063)WARNING: SCSI: 2896: CheckUnitReady on vmh

ba1:0:0 returned Storage initiator error 0x0/0x0 sk 0x0 asc 0x0 ascq 0x0

Jun 3 12:01:02 087 vmkernel: 28:12:58:19.050 cpu9:1063)WARNING: SCSI: 3743: Could not switchover

to vmhba1:0:0. Check Unit Ready Command returned an error instead of NOT READY for standby controller .

Jun 3 12:01:02 087 vmkernel: 28:12:58:19.050 cpu9:1063)WARNING: SCSI: 4619: Manual switchover to

vmhba1:0:0 completed unsuccessfully.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.248 cpu0:1024)SCSI: 634: Queue for device vml.0200000000

50060160b9a00cdf50060160b9a00cdf4c554e5a2020 is being blocked to check for hung SP.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.249 cpu9:1063)SCSI: 2741: Could not locate path to peer

SP for CX SP A path vmhba1:0:0.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.249 cpu9:1063)SCSI: 2741: Could not locate path to peer

SP for CX SP A path vmhba1:0:0.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.249 cpu9:1063)WARNING: SCSI: 2896: CheckUnitReady on vmh

ba1:0:0 returned Storage initiator error 0x0/0x0 sk 0x0 asc 0x0 ascq 0x0

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.249 cpu9:1063)WARNING: SCSI: 2896: CheckUnitReady on vmh

ba2:0:0 returned Storage initiator error 0x0/0x0 sk 0x0 asc 0x0 ascq 0x0

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.249 cpu9:1063)WARNING: SCSI: 4559: Manual switchover to

path vmhba1:0:0 begins.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.250 cpu9:1063)WARNING: SCSI: 2896: CheckUnitReady on vmh

ba1:0:0 returned Storage initiator error 0x0/0x0 sk 0x0 asc 0x0 ascq 0x0

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.250 cpu9:1063)WARNING: SCSI: 3743: Could not switchover

to vmhba1:0:0. Check Unit Ready Command returned an error instead of NOT READY for standby controller .

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.250 cpu9:1063)WARNING: SCSI: 4619: Manual switchover to

vmhba1:0:0 completed unsuccessfully.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.448 cpu0:1024)SCSI: 634: Queue for device vml.0200000000

50060160b9a00cdf50060160b9a00cdf4c554e5a2020 is being blocked to check for hung SP.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.450 cpu5:1064)SCSI: 2741: Could not locate path to peer

SP for CX SP A path vmhba1:0:0.

Jun 3 12:01:03 087 vmkernel: 28:12:58:19.450 cpu5:1064)SCSI: 2741: Could not locate path to peer

SP for CX SP A path vmhba1:0:0.

This log entries keep repeating in the vmkernel.

Also, when I try to manage the paths from the lun vmhba1:1:9 in the esx 086 from the vi client, nothing is displayed and the following error appears: InvalidArgument=Value of '0' is not valid for 'index'. Parameter name: index

I don't have too much experience with this and I'm not a SAN guy. Could you help me understanding the problem (or problems) and give some guide me on the solution?

Thanks

Reply
0 Kudos
1 Solution

Accepted Solutions
AndreTheGiant
Immortal
Immortal
Jump to solution

Would it be better to call support ?

Yes Smiley Happy

Andre

**if you found this or any other answer useful please consider allocating points for helpful or correct answers

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro

View solution in original post

Reply
0 Kudos
11 Replies
AndreTheGiant
Immortal
Immortal
Jump to solution

Be sure that on storage side is all ok.

If you have path trashing on storage side you must have LUN trespass.

Or have some connection that are not registered.

Also check your fabric topology to be sure that is correct.

Andre

**if you found this or any other answer useful please consider allocating points for helpful or correct answers

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
mbent
Contributor
Contributor
Jump to solution

The storage guys are telling me that is everything ok on their side....

Would it be better to call support ?

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal
Jump to solution

Would it be better to call support ?

Yes Smiley Happy

Andre

**if you found this or any other answer useful please consider allocating points for helpful or correct answers

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
bobross
Hot Shot
Hot Shot
Jump to solution

There is most certainly a problem on the storage side...your management LUNs (a.k.a. pseudo-LUNs) are not communicating with your server. Therefore, all your data LUNs (on their mount points) can't figure out which path to use. Remember, in the CLARiiON architecture, the SPs do not change paths by themselves, they must be told to in the process known as trespass. It looks like the trespass cannot communicate, since the 0MB LUNs are path-dead. I'd get your storage guys to help you here, take the normal steps like checking zoning, name servers, etc. and use all the support you can get. Good luck.

okeedokee
Enthusiast
Enthusiast
Jump to solution

Here is a VMware article on the issue. This helped my understanding.

Regards

Reply
0 Kudos
mvoss18
Hot Shot
Hot Shot
Jump to solution

We've seen path thrashing when folks manually load balanced paths in a simple manor. For example if you have odd numbered LUNs manually set to use SP1 while even numbered LUNs are set to use SP2. If LUN1 and LUN2 are part of the same disk group, the same SP should be preferred. If SP1 and SP2 are trying to hit the disk group with LUN1 and LUN2 simultaneously, the SAN might experience thrashing.

Make sure, if manually setting preferred paths, that the same SP is preferred for all LUNs within a disk group.

mbent
Contributor
Contributor
Jump to solution

Thanks to all for your helpfull answers.

Reply
0 Kudos
admin
Immortal
Immortal
Jump to solution

The path policy of the LUNs should be set to MRU. This would prevent path thashing.

If you need to use FIXED path policy with an Active/Passive array (such as the CX) care must always be taken when configuring static loading balancing.

Reply
0 Kudos
GraphiteDak
Enthusiast
Enthusiast
Jump to solution

I have an entire cluster doing this today. I am starting to think that the previous bug may have gotten accidently put back into a newer patch. We're running current patch level 163429 on these 16 in this cluster. the only cluster having the issue as well.

Reply
0 Kudos
n2l
Contributor
Contributor
Jump to solution

This cluster are in build-110268

Reply
0 Kudos
ThompsG
Virtuoso
Virtuoso
Jump to solution

Evening,

Not sure if you still have the problem or not but we had a similar instance. Now I'm making the following assumptions so disregard if this is not the case in your environment:

1. Dual fabrics

2. Multiple switches in the fabrics connected via ISL

In our case the ISL ports were reading low Tx power (less than 100 mW). We switched the cables with new ISL cables and the problem when away. We only discovered this issues after connecting a new ESX server into a switch that was required to traverse the ISL to get to the storage. Wierd but true. This server would eventually bring datastores down with these errors.

Trust this helps your situation.

Kind regards,

Glen

Reply
0 Kudos