Hi, i've a lab setup for testing of some VMs.
All license is regular. hardware is a simple node dell T7910, with Dell LSI MegaRAID 9341-8i with 1 ssd pool and 1 hdd pool.
Without any specific reason, i see latency on disk increase without limit, last time until 40seconds.
In this moment, no people is using this server or VMs, and latency is 720ms for some vms; no task are executed and no task are scheduled.
I've tried with some tools on windows vm to generate i/o stress on ssd pool, with some tests for 40k iops as ssd pool, with queue on 16 and test on 4k settings.
ESX version is 8.0update 1.
Can someone help me to debug this?
can you please share esxtop screenshot for adapter view with D, share device view with U and VM view with V.
PS: looking for DAVG,KAVG, GAVG values
Also share the host task list with counts:
vim-cmd vimsvc/task_list |wc -l
vim-cmd vimsvc/task_list
Here was a similar latency issue thread (but on HP hardware):
https://communities.vmware.com/t5/vSphere-Storage-Discussions/Disk-Latency-Issues/td-p/2535575
lots of discussion in this post, but it came down to looking at the storage configuration (in this case it was a raid 6). How is your MegaRAID configured? This thread could help.
Hi! megaraid is configured with a single ssd pool in raid 5, no spare is present and all disk are healty. i've double check also virtual volume and there aren't logs or problems logged by controller or bios, so i think that virtual volume and all ssd are ok and configuration is valid.
FYI, i've forgot to specify that HDD pool is not under megaraid controller but only on sata connection on m/b.
thanks!
To make sure that I can replicate issues, I've attached a screen that is now being used during a virtual machine deploy from template.
Can you please confirm the controller driver version?
https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=35201&vcl=true
ESXi 8.0 U1 | lsi_mr3 version 7.724.03.00-1vmw |
You can run command below
for a in $(esxcfg-scsidevs -a |awk '{print $2}') ;do echo $a; vmkload_mod -s $a |grep -i version ;done | awk '!a[$0]++'; for a in $(esxcfg-scsidevs -a |awk '{print $1'}); do vmkchdev -l |grep $a | awk -F" |:" '{print "http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID="$4"&DID="$5"&SVID="$6"&SSID="$7"&details=1"}'; done | awk '!a[$0]++'; vmware -vl
PS: just checking if the dell server is compatible with running esxi 8.x?
==
Following the screenshot, the storage device session thread is maxed out with 100% usage and host is queuing IO, this is where you have a very high latency on DAVG and since host is re-sending there is added injected latency from host on KAVG and the total on GAVG.
--
regarding host task list, please run disable FCD disk query on your vcenter if not using VMware Tanzu or Kubernetes.
Hi, sorry for late reply.
i've made some test, from changing controller and also a new fresh vmware installation (version 7).
At the moment, command that you sent me has this output:
vmw_ahci
Version: 2.0.9-1vmw.702.0.0.17867351
lsi_msgpt3
Version: 17.00.10.00-2vmw.702.0.0.17867351
lsi_mr3
Version: 7.716.03.00-1vmw.702.0.0.17867351
http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=8086&DID=8d02&SVID=1028&SS...
http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=1000&DID=0097&SVID=1028&SS...
http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=1000&DID=005f&SVID=1000&SS...
VMware ESXi 7.0.2 build-18538813
VMware ESXi 7.0 Update 2
Changing controller made no effects, and change vmware version the same. Actually i was working on docker swarm test infrastructure, and is impossible to make any kind of task.
PCI port is PCI Express 3.0 x16.
i don't know where start to looking for solutions or similar problems. look at that fantastic screenshot.
thanks!