VMware Cloud Community
MSchaff
Contributor
Contributor

Sporadic disk latency alarms on shared F/C disk array

Hello everyone,

I'm seeing some strange disk latency alarms, and wanted to get some feedback on them.  Our environment includes an IBM blade chassis and a Dell blade chassis, each of which is connected to F/C switches in their respective chassis, then connecting to an EMC VNX5200 disk array.  The IBM chassis includes Cisco switches connecting at 4Gbps, and the Dell chassis includes Brocade switches connecting at 8 Gbps.  All hosts are running ESXi 6.0.

Most of the VMs running on the IBM chassis are contained in individual LUNs, with some exceptions.  The VMs on the Dell chassis are generally grouped into LUNs, with a given LUN containing up to 12 or 15 individual VMs.

Periodically, but with no consistency, one or two VMs running on the Dell server blades will report high disk latency.  The alarm condition is raised, then clears itself after a few seconds. I'd like to identify the root cause of this, as it happens sporadically throughout the entire day, though there is no pattern that I can see.  I might see it happen twice a day, or not at all.  Another aspect that puzzles me is that it usually only happens to a single VM at a time, even though that VM is just one of several sharing the same LUN.  Other VMs on that same LUN are unaffected, though I know they all have disk activity taking place.  At the time the alarms are raised, other VMS in other LUNs on the array show a status of normal, even though all LUNS are in the same performance pool, and they all share the same redundant F/C links to the disk array.

I would expect that if the disk array was experiencing some type of latency issue, all LUNs would be affected, and most VMs would report the same latency alarm, but that doesn't happen.  Most often, only a single VM alarm is raised, though I do see an occasional instance where a few VM alarms are raised.  A moment or two later, the alarms clear.

Does anyone have any thoughts they can share on this?  I don't expect that this is a common scenario, but if you've seen it, or if you have any suggestions on how to resolve it, I'd be very interested in hearing about them.  Thanks in advance for your insights on this!

Mitchell

0 Kudos
2 Replies
DavoudTeimouri
Virtuoso
Virtuoso

At first, let me know that what's storage array? Do you using mixed pool or raid group? How many disks are in pool/raid group?

I think, VMs are generating lot of IO and the problem is related to the pool or LUN not storage array.

You should use smaller LUNs with fewer VMs.

-------------------------------------------------------------------------------------
Davoud Teimouri - https://www.teimouri.net - Twitter: @davoud_teimouri Facebook: https://www.facebook.com/teimouri.net/
0 Kudos
MSchaff
Contributor
Contributor

Hello Davoud, and thanks for the reply.  The storage array is an EMC VNX 5200.  The LUNS are located on a mixed pool of drives, with the pool consisting of:

18 15K 900 GB SAS drives

8 10K 1900 GB SAS drives

I can break the LUNS up into smaller sizes, as one option.  I'm puzzled why all the VMs in a given LUN don't seem to be affected.

Does the above information help lead to any conclusions?

Thanks!

0 Kudos