We have a ESXi 6.0 Update 1 host with 18 vms running that do not have a huge usage profile. We migrated from local storage (4x600gb 15k raid 10) to a VNXe san and now we have lots of latency. We are also seeing this latency on an older san that we moved 4 vms to just for testing. Latency does not go high and stay high it bounces from 0-2ms to 500-1000ms but its in spikes. I also see the same spikes in the throughput section on our disks but its not from a single vm but from the ESXi server itself.
We have disabled delayed acks, tried changing iops=10 for round robin profile for our iscsi connection. (Iscsi connection is 2 1gb nics each going to separate vlans ) When moving files between SANs we get almost 90MB/s so throughput is available.
I'm just wondering what could be causing this latency?
Have you enabled jumbo frames on your vSwitches and physical switches between hosts and storage.
I just had Force10 look at our configuration to make sure and its all configured correctly with Jumbo,flowcontrol and edge-port settings.
One more datapoint is that the IBM San is not using Jumbo but EMC is currently set to Jumbo and they both have this issue. Looking at esxtop I see Kernel spike when this happens:
9:39:57pm up 2 days 5:05, 715 worlds, 7 VMs, 28 vCPUs; CPU load average: 0.08, 0.08, 0.07
ADAPTR PATH NPTH CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
vmhba0 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba1 - 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba32 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba33 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba34 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba35 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba36 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba37 - 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba38 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba39 - 22 162.57 0.38 162.20 0.00 41.38 4.89 285.24 290.13 82.26
then seconds later its like this:
9:43:45pm up 2 days 5:09, 715 worlds, 7 VMs, 28 vCPUs; CPU load average: 0.08, 0.08, 0.08
ADAPTR PATH NPTH CMDS/s READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd
vmhba0 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba1 - 1 14.69 14.69 0.00 0.38 0.00 0.11 0.01 0.12 0.00
vmhba32 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba33 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba34 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba35 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba36 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba37 - 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba38 - 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
vmhba39 - 22 142.60 0.00 142.60 0.00 1.55 1.28 0.01 1.29 0.00
This IBM san currently only has 6 vms so there is no reason this should spike.
What type of disk ? SAS / FC ?
Do you have any graphic with this latency to share with us ?
Try to generate a graphic using latency read and write please.
In the case of the DS3500 which is where all these vms are currently running it has 5 600gb 15 SAS disks in Raid-5.
Emc san has 2SSD for tier-1 then 20 1tb 10k SAS disks in 4 raid5 groups...
The following values are recommended, based upon the type of disk used:
FC: 20-30 ms
SAS: 20-30 ms
SATA: 30-50 ms
SSD: 15-20 ms
Is storage I/O Control enabled ?
be sure that you have jumbo frame configured in both points (adapter host, switch and storage) how mentioned before.
these instances are free versions of esxi so no vcenter or I/O Controler.
so you will have a lot of trobleshooting.
Check if this hight latency is in a specific LUN, If you found anything, try migrate VMs.
also try to check inside of storage what is happening.
and check too https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1267
and maybe VMware KB: Troubleshooting ESX/ESXi virtual machine performance issues (storage part)