I have a lot of performance deterioration messages appearing on devices with queue depths lower than 32, compared to devices with queue depth set to 32 .
I used the esxcli storage core device set -d naa.xxx -O Value command to change the queue depth to 32 on a lot of the LUNs. But they were changed back after some time to values lower than 32.
From a performance perspective, I would like to know if the latency seen on the VMs is due to low queue depth settings, or if the low queue depth is a symptom of high storage latency (SIOC detects storage latency, and therefore reduces queue depth settings on the device).
I found KB https://kb.vmware.com/s/article/1268, where it says "No. of outstanding IOs with competing worlds parameter is limited to Max Queue Depth of the device"
The max queue depth of the device; is that the queue depth set at the HBA / nfnic using (esxcli system module parameters set -m nfnic -p lun_queue_depth_per_path=32)?
Is this the case because I didn't update this on all hosts, so some hosts have it to 32 and others to lower and maybe this is causing this behavior? Or is changing the queue depth to 64 instead advisable perhaps?
Thanks in advance.
Having different queue depths across hosts that share the same Storage Array is not recommended as you can face different performances based on where the workload is running. It is always recommended to maintain the queue depth the same across hosts.
On the other hand there are scenarios where the queue depth is different by default because of the HBA model that is being used or even if we use iSCSI software adapter. For example the default queue depth for the iSCSI software adapter is 128, for Qlogic is 64 and for Emulex and Brocade is 32.
As you clearly said, SIOC manages the depth queue if he looks for latency but queue depth should be modified to decreate the contention while accessing the SAN. But for that you need to check on all the steps that the I/O is crossing to reach the array such as HBA, SAN Switch,and the array itself (Of course this can be different depending on the Storage protocol being used)
Here is a really good post that explains how to check everything based on one vendor: https://www.codyhosterman.com/2017/02/understanding-vmware-esxi-queuing-and-the-flasharray/