VMware Cloud Community
Paul1966
Contributor
Contributor

Nonsense DAVG/KAVG numbers

I'm investigating performance issues in a VDPA appliance. When looking at disk performance, I get nonsensical, cancelling values:

1:05:06pm up 21 days 3 min, 711 worlds, 2 VMs, 6 vCPUs; CPU load average: 0.02, 0.02, 0.02

DEVICE                                PATH/WORLD/PARTITION DQLEN WQLEN ACTV QUED %USD  LOAD   CMDS/s  READS/s WRITES/s MBREAD/s MBWRTN/s DAVG/cmd KAVG/cmd GAVG/cmd QAVG/cmd

mpx.vmhba32:C0:T0:L0                           -               1     -    0    0    0  0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

mpx.vmhba40:C0:T0:L0                           -               1     -    0    0    0  0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

naa.5000c5007941272e                           -             128     -    0    0    0  0.00     0.38     0.00     0.38     0.00     0.00     0.28     0.02     0.30     0.00

naa.6006016070e03900014e140e434ae411           -             128     -    0    0    0  0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

naa.6006016070e03900299d3519434ae411           -             128     -    0    0    0  0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

naa.6006016070e0390099bd9dfe424ae411           -             128     -    0    0    0  0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00     0.00

naa.6006016070e03900c2cb46a4b35be411           -             128     -    0    0    0  0.00     0.38     0.00     0.00     0.00     0.00 9223372036854776.00 -9223372036854774.00     0.82     0.01

naa.6006016070e03900cc195b8b434ae411           -             128     -    0    0    0  0.00     0.38     0.00     0.00     0.00     0.00     0.00     0.81     0.81     0.01

naa.6006016070e03900e6dcbf2d434ae411           -             128     -    0    0    0  0.00     0.38     0.00     0.00     0.00     0.00     0.00     0.65     0.65     0.01

{NFS}FreeNAS.4.80                              -               -     -    0    -    -     -    22.32    18.69     3.62    10.87     0.01        -        -    87.92        -

The naa.600 entries are on a EMC SAN, these values were pointed out to me by VMware support as indication I had an issue with the FreeNAS NFS volume this appliance is stored on. Looking at the actual FreeNAS server, What could cause this. and should I worry, or is it safe to ignore?

Its connected via 10G iSCSI link to the storage pool, and there's hardly any traffic on it.

1 Reply
vNEX
Expert
Expert

Welcome to the community,

extremely high GAVG/KAVG values can be caused by VAAI activity for more info see this KB:

VMware KB: Abnormal DAVG and KAVG values observed during VAAI operations

For the FreeNAS issues in regards to high GAVG (87.92) ... how is the host connected to the NFS volume in terms of network topology?

Are the vmkernel ports used for iSCSI on the same subnet as FreeNAS server? I am asking because prior to ESXi 5.0 Layer 3 NFS is not supported.

If you have 5.0 U1 and later release its supported but you have to meet these requirements:

Using Layer 3 Routed Connections to Access NFS Storage When you use Layer 3 (L3) routed connections to access NFS storage, consider certain requirements and restrictions.

Ensure that your environment meets the following requirements:

■  Use Cisco's Hot Standby Router Protocol (HSRP) in IP Router. If you are using non-Cisco router, be sure to use Virtual Router Redundancy Protocol (VRRP) instead.

■  Use Quality of Service (QoS) to prioritize NFS L3 traffic on networks with limited bandwidths, or on networks that experience congestion. See your router documentation for details.

■  Follow Routed NFS L3 best practices recommended by storage vendor. Contact your storage vendor for details.

■  Disable Network I/O Resource Management (NetIORM).

■  If you are planning to use systems with top-of-rack switches or switch-dependent I/O device partitioning, contact your system vendor for compatibility and support. In an L3 environment the following restrictions apply:

■  The environment does not support VMware Site Recovery Manager.

■  The environment supports only NFS protocol. Do not use other storage protocols such as FCoE over the same physical network.

■  The NFS traffic in this environment does not support IPv6.

■  The NFS traffic in this environment can be routed only over a LAN. Other environments such as WAN are not supported. Feedback

For VDPA please also check the latency values using ESXTOP have a look at  Lat/rd or Lat/wr for the virtual machine.

If you found them high you may try to disable NIC interrupt moderation which can help you to reduce the latency:

Using Ethtool on desired VMNIC:

ethtool -C vmnicX <number> rx-usecs 0 rx-frames 1 rx-usecs-irq 0 rx-frames-irq 0

or

Using Esxcli for the NIC module/driver:

# esxcli system module parameters set -m ixgbe -p "InterruptThrottleRate=0"

Some consequences from above settings are as follows:

Disabling interrupt moderation on physical NICs is extremely helpful in reducing latency for latency-sensitive VMs, it can lead to some performance penalties for other VMs on the ESXi host, as well as higher CPU utilization to handle the higher rate of interrupts from the physical NIC.

Disabling physical NIC interrupt moderation can also defeat the benefits of Large Receive Offloads (LRO), since some physical NICs (like Intel 10GbE NICs) that support LRO in hardware automatically disable it when interrupt moderation is disabled, and ESXi’s implementation of software LRO has fewer packets to coalesce into larger packets on every interrupt. LRO is an important offload for driving high throughput for large-message transfers at reduced CPU cost, so this trade-off should be considered carefully.

_________________________________________________________________________________________ If you found this or any other answer helpful, please consider to award points. (use Correct or Helpful buttons) Regards, P.