We have 3 x ESXI hosts, hooked up to a NetApp E2824 (4 x SSD cache disks, 20 x 10K SAS - all in a single RAID6 pool).
There are around 55 VMs on this cluster.
The hosts are hooked up to Juniper EX4300 switches with 3 x 1Ggbps adapters, software iSCSI adapter with Delayed Ack disabled.
There are 7 iSCSI LUNs, all backed by this single RAID6 pool.
Average disk write latencies are all low, with frequent spikes of very high values.
Average read latencies are not an issue, presumable because of the SSD cache.
The values below are the average VM disk write latency peaks over 1 hour, per LUN.
LUN 1 - 1 VM, Max avg VM latency: 42
LUN 2 - 3 VMs, Max avg VM latency: 61
LUN 3 - 3 VMs, Max avg VM latency: 22
LUN 4 - 9 VMs, Max avg VM latency: 200
LUN 5 - 11 VMs, Max avg VM latency: 170
LUN 6 - 6 VMs, Max avg VM latency: 137
LUN 7 - 21 VMs, Max avg VM latency: 500
Looking at this data, it would appear that having 21 VMs on a single LUN is creating a lot of write latency, but all LUNs are backed by a single RAID6 array, so I can't see how splitting that LUN into smaller ones will help? Although logically, that is the next step.
Hey, hope you are doing fine
Just some points to bring to the table:
- the Juniper 1Gbps might be a bottleneck
- The ammount of VMs is not the most accurate way to measure IOPS/latency.
Why? You can have 1 or 2 Very IO intensive SQL/Oracle VMs that might place the same load to the array as 5 or 10 VMs
- IOPS and latency are not static values, they can change during the day.
Do you have vROPs Availalbe? Have you checked vCenter Perfcharts
- have you checked HBA Load? using esxtop option d, u and v
I think that the most tactical approach for this would be identifying which are the most IO intensive VMs on that LUN 7 (using esxtop option d, u and v) and balance load across LUN 5, 6 and 7
Also, if possible (and licencsing allows that) consider enabling Storage DRS and creating a Storage Cluster
If you have any doubt or i was not clear please let me know, i will gladly help you.
Warm regards