Network Performance Analysis and Monitoring

Introduction

This page is a living, up-to-date version of the performance analysis methods whitepaper.

Check Utilization

esxtop will provide network information on the network screen which is displayed with the ‘n' key.

The following properties of this screen are worth particular attention:

Each row represents one of several relevant network items on the server: a physical NIC (vmnicX), a virtual switch interface (vswifX), a VM (contains the VM name), the VMkernel network stack (vmk-tcpip-A.B.C.D), and others.
The network items are organized by the virtual switch to which they are attached. The virtual switch name is listed under the DNAME column.
Network traffic on the hypervisor's iSCSI initiator will show up on the VMkernel network row which will contain the name "vmk-tcpip-A.B.C.D", where A.B.C.D is the VMkernel IP address.
Network traffic on an iSCSI initiators that were configured in the guest will show up on the vNIC displayed using the VM's name on the network panel.
Total throughput for each item can be observed by summing the total transmitted data (MbTX/s) and received data (MbRX/s) for each item. As the physical hardware becomes saturated transmitted and received packets will start to be dropped (%DRPTX and %DRPRX, respectively) which, depending on protocol, may result in a retransmission at a later time.

Evaluate the Data

Does the physical NIC's reported speed and duplex setting match the expectation of the hardware? Hardware connectivity issues may result in a NIC autonegotiating to a lower speed or half duplex mode.
Is there a significant load on the appropriate network items? For instance, is a network-intensive load in a guest actually generating the network activity on its vNIC that is expected? Are storage-intensive loads generating traffic on the vNIC or vmkNIC when the hypervisor or guest initiators are used?
Verify that the network traffic is flowing on appropriate NICs. A typical ESX host may have network traffic generated by VMs, network traffic from iSCSI protocol, VMotion related network traffic and service console associated network activity. It is recommended to have to separate NICs to handle these different network packets.
During periods of saturation, is the total throughput (MbTX/s summed with MbRX/s) matching expectations? Either the guest or the other end of the communication link may be throttling the performance.
Are packets being dropped? When overworked the hardware will refuse packets which get reported as dropped transmitted (%DRPTX) and received (%DRPRX) packets.

Correct the System

Make sure that the hardware is configured to run at its maximum capability. This means verifying that 1 Gb NICs are not autonegotiating down to 100 Mb/s for having been connected to an older switch. Similarly, ensure that NICs are running in full duplex mode.
When network throughput seems lower than expected, apply traditional network diagnosis techniques to investigate every link in the connection. Low throughput at the ESX Server is not necessarily due to server configuration.
Verify that VMware Tools is installed on the guests and TSO, Jumbo Frames, and 10 Gb Ethernet are enabled, where possible.
Bond multiple physical NICs to virtual switches with high utilization.
Provide separate virtual switches their own physical NICs and separate network-intensive VMs on their own vSwitches.
If VMs running on the same ESX Server communicate with each other, connect them to a dedicated virtual switch so that all network transfers occur in memory and not packets are shipped over the wire.

VMFS and RDM Considerations

ESX Server supports the mapping of physical LUNs to virtual machines via a method called raw device mapping (RDM). RDM eliminates VMFS from the stack which is incorrectly believed to be a source of performance problems. Removing VMFS reduces the total number of addressable LUNs, eliminates the ability to perform storage migrations (storage VMotion), and greatly increases the effort required for simplified maintenance activities provided by site recovery manager. And the performance benefits derived from the removal of VMFS are negligible.

See the performance characteristics of VMFS and RDM whitepaper for more information on this subject.

Resources

The top-level performance analysis page: Performance Monitoring and Analysis

VirtualCenter performance counters: Understanding VirtualCenter Performance Statistics

esxtop performance counters: esxtop Performance Counters

All

Network Performance Analysis and Monitoring