VMware Cloud Community
matteocavalliim
Contributor
Contributor

Latency on datastore

Hi, i've a lab setup for testing of some VMs.

All license is regular. hardware is a simple node dell T7910, with Dell LSI MegaRAID 9341-8i with 1 ssd pool and 1 hdd pool.

Without any specific reason, i see latency on disk increase without limit, last time until 40seconds.

In this moment, no people is using this server or VMs, and latency is 720ms for some vms; no task are executed and no task are scheduled.
I've tried with some tools on windows vm to generate i/o stress on ssd pool, with some tests for 40k iops as ssd pool, with queue on 16 and test on 4k settings.

ESX version is 8.0update 1. 

Can someone help me to debug this?

0 Kudos
7 Replies
bbalido9
Contributor
Contributor

can you please share esxtop screenshot for adapter view with D, share device view with U and VM view with V. 

PS: looking for DAVG,KAVG, GAVG values

Also share the host task list with counts:

vim-cmd vimsvc/task_list |wc -l 

vim-cmd vimsvc/task_list 

 

0 Kudos
NateNateNAte
Hot Shot
Hot Shot

Here was a similar latency issue thread (but on HP hardware): 

https://communities.vmware.com/t5/vSphere-Storage-Discussions/Disk-Latency-Issues/td-p/2535575

lots of discussion in this post, but it came down to looking at the storage configuration (in this case it was a raid 6). How is your MegaRAID configured? This thread could help.

0 Kudos
matteocavalliim
Contributor
Contributor

Hi!

i've attached screen,at this moment,during a vm deploy from template, just to be sure that i can reproduce problems.

do you see something strage?

Thanks!

0 Kudos
matteocavalliim
Contributor
Contributor

Hi! megaraid is configured with a single ssd pool in raid 5, no spare is present and all disk are healty. i've double check also virtual volume and there aren't logs or problems logged by controller or bios, so i think that virtual volume and all ssd are ok and configuration is valid.

FYI, i've forgot to specify that HDD pool is not under megaraid controller but only on sata connection on m/b.

thanks!

0 Kudos
Hmzi1545432
Contributor
Contributor

To make sure that I can replicate issues, I've attached a screen that is now being used during a virtual machine deploy from template.

0 Kudos
bbalido9
Contributor
Contributor

Can you please confirm the controller driver version?

https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=35201&vcl=true

ESXi 8.0 U1lsi_mr3 version 7.724.03.00-1vmw

 

You can run command below 

for a in $(esxcfg-scsidevs -a |awk '{print $2}') ;do echo $a; vmkload_mod -s $a |grep -i version ;done | awk '!a[$0]++'; for a in $(esxcfg-scsidevs -a |awk '{print $1'}); do vmkchdev -l |grep $a | awk -F" |:" '{print "http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID="$4"&DID="$5"&SVID="$6"&SSID="$7"&details=1"}'; done | awk '!a[$0]++'; vmware -vl

PS: just checking if the dell server is compatible with running esxi 8.x?

"https://www.vmware.com/resources/compatibility/detail.php?deviceCategory=io&productid=35201&vcl=tru...

== 

Following the screenshot, the storage device session thread is maxed out with 100% usage and host is queuing IO, this is where you have a very high latency on DAVG and since host is re-sending there is added injected latency from host on KAVG and the total on GAVG.

-- 

regarding host task list, please run disable FCD disk query on your vcenter if not using VMware Tanzu or Kubernetes. 

If you are not planning to use Kubernetes/Tanzu in the near future, you can disable Catalog Sync (and the log messages it generates).
- To do this, please make a copy of the file /usr/lib/vmware-vpx/sps/conf/vslm.properties
- Then edit the original and add the following line at the end:
 
vslm.disablePeriodicSync = Y
 
- Save the file then and restart the vmware-sps service with:
# vmon-cli -r sps
 
 

 

 

0 Kudos
matteocavalliim
Contributor
Contributor

Hi, sorry for late reply.

i've made some test, from changing controller and also a new fresh vmware installation (version 7).

At the moment, command that you sent me has this output:

vmw_ahci
Version: 2.0.9-1vmw.702.0.0.17867351
lsi_msgpt3
Version: 17.00.10.00-2vmw.702.0.0.17867351
lsi_mr3
Version: 7.716.03.00-1vmw.702.0.0.17867351
http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=8086&DID=8d02&SVID=1028&SS...
http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=1000&DID=0097&SVID=1028&SS...
http://partnerweb.vmware.com/comp_guide2/search.php?deviceCategory=io&VID=1000&DID=005f&SVID=1000&SS...
VMware ESXi 7.0.2 build-18538813
VMware ESXi 7.0 Update 2

Changing controller made no effects, and change vmware version the same. Actually i was working on docker swarm test infrastructure, and is impossible to make any kind of task.

PCI port is PCI Express 3.0 x16.

i don't know where start to looking for solutions or similar problems. look at that fantastic screenshot.

thanks!

0 Kudos