VMware Cloud Community
cc_infrastructu
Contributor
Contributor

High write latency spikes. Number of VMs per iSCSI LUN with NetApp E-Series.

We have  3 x ESXI hosts, hooked up to a NetApp E2824 (4 x SSD cache disks, 20 x 10K SAS - all in a single RAID6 pool).

There are around 55 VMs on this cluster.

The hosts are hooked up to Juniper EX4300 switches with 3 x 1Ggbps adapters, software iSCSI adapter with Delayed Ack disabled.

There are 7 iSCSI LUNs, all backed by this single RAID6 pool.

Average disk write latencies are all low, with frequent spikes of very high values.

Average read latencies are not an issue, presumable because of the SSD cache.

The values below are the average VM disk write latency peaks over 1 hour, per LUN.

LUN 1 - 1 VM, Max avg VM latency: 42

LUN 2 - 3 VMs, Max avg VM latency: 61

LUN 3 - 3 VMs, Max avg VM latency: 22

LUN 4 - 9 VMs, Max avg VM latency: 200

LUN 5 - 11 VMs, Max avg VM latency: 170

LUN 6 - 6 VMs, Max avg VM latency: 137

LUN 7 - 21 VMs, Max avg VM latency: 500

Looking at this data, it would appear that having 21 VMs on a single LUN is creating a lot of write latency, but all LUNs are backed by a single RAID6 array, so I can't see how splitting that LUN into smaller ones will help? Although logically, that is the next step.

1 Reply
nachogonzalez
Commander
Commander

Hey, hope you are doing fine

Just some points to bring to the table:

- the Juniper 1Gbps might be a bottleneck
- The ammount of VMs is not the most accurate way to measure IOPS/latency.
Why? You can have 1 or 2 Very IO intensive SQL/Oracle VMs that might place the same load to the array as 5 or 10 VMs
- IOPS and latency are not static values, they can change during the day.
Do you have vROPs Availalbe? Have you checked vCenter Perfcharts
- have you checked HBA Load?  using esxtop option d, u and v

I think that the most tactical approach for this would be identifying which are the most IO intensive VMs on that LUN 7 (using esxtop option d, u and v) and balance load across LUN 5, 6 and 7


Also, if possible (and licencsing allows that) consider enabling Storage DRS and creating a Storage Cluster


If you have any doubt or i was not clear please let me know, i will gladly help you.


Warm regards