VMware Performance Community
Virtualboy1
Contributor
Contributor

Is this to be concerned about. DAVG figures. Storage latency?

So I have been suspecting storage latency issues in our VMware environment for a while. So been working on ways to find solution to the problem. Core issue has been ESXi host disconnecting from vCenter server and getting greyed out. When this happens we can't do much with the host or VM's on it except for shutting them down and starting them on different ESXi host.

As a result I've been monitoring things using ESXTOP while I work on remediation and understand bottlenecks. Before DAVG figures were spiking above 25. So apparently anything above 25 is bad and indicates storage latency.

However today when I was monitoring one of the host I came across these crazy figures for DAVG. And to me those looks bad.

Now I understand I can't just look at DAVG and need to look at bigger picture.

Can someone please look at attached and let me know how bad this is? Are spikes like this in DAVG normal? Or should it ideally not got this high even for a spike?

More about environment below.

VMware ESXi, 7.0.3, 20842708

vCenter 7.0.3 21477706

Dell PowerEdge hardware

Dell Storage Center SCSI storage array

iSCSI protocol being used to connect to storage array using software iSCSI adapter

Backup runs mostly through the day so there is no definite time when backup runs or doesn't run

Also have Zerto which is used for replication purpose

Finally we also have live volumes at storage array layer

We're looking at things like queue depth at iSCSI vmk port, making sure round robin is used for path selection etc. I must mention all firmware and drivers are updated recently already. This has been double checked already including compatibitlity

0 Kudos
3 Replies
dmorse
VMware Employee
VMware Employee

@Virtualboy1 This particular board is for assistance with the VMmark benchmark only.  I would suggest posting this under ESXi Discussions - VMware Technology Network VMTN instead.

Thanks,
David

0 Kudos
Tapas124
Contributor
Contributor

These values are high.

DAVG/cmdKAVG/cmd, and GAVG/cmd should not exceed more than 10 milliseconds (ms) for sustained periods of time. If these values are constant I would check the contention latency from the storage side.

As the backup runs, I would also check the read values on the storage side and validate if the high reads are causing a spike.

Also, please check the HBA latency.  The screenshot looks to be individual LUNs.

Press d to switch to disk view (HBA mode)

 

 

tmahanta
0 Kudos
na1231
Contributor
Contributor

 

DAVG Value of 25 is certainly High by default VMware has set IOPS value for all the LUN's to 1000,

 

run the following command on a specific LUN which is showing High DAVG value and reduce the IOPS value from 1000 to 10, this is an online activity no downtime or any impact.

 

for i in `esxcfg-scsidevs -c |awk '{print $1}' | grep <Enter LUN NAA ID>`; do esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops=1 --device=$i; done

This will reduce the Latency significantly,

ask your storage team to check the Latancy from storage side.

Tags (1)
0 Kudos