Network Performance Analysis and Monitoring

Version 6

    Introduction

    This page is a living, up-to-date version of the performance analysis methods whitepaper.

    Check Utilization

    esxtop will provide network information on the network screen which is displayed with the ‘n' key.

    http://communities-prod-app-2.vmware.com:8080/docs/DOC-5500/esxtop-network-main.JPG

    The following properties of this screen are worth particular attention:

     

    • Each row represents one of several relevant network items on the  server: a physical NIC (vmnicX), a virtual switch interface (vswifX), a  VM (contains the VM name), the VMkernel network stack  (vmk-tcpip-A.B.C.D), and others.
    • The network items are organized by the virtual switch to which they  are attached.  The virtual switch name is listed under the DNAME column.
    • Network traffic on the hypervisor's iSCSI initiator will show up on  the VMkernel network row which will contain the name  "vmk-tcpip-A.B.C.D", where A.B.C.D is the VMkernel IP address.
    • Network traffic on an iSCSI initiators that were configured in the  guest will show up on the vNIC displayed using the VM's name on the  network panel.
    • Total throughput for each item can be observed by summing the total  transmitted data (MbTX/s) and received data (MbRX/s) for each item.  As  the physical hardware becomes saturated transmitted and received packets  will start to be dropped (%DRPTX and %DRPRX, respectively) which,  depending on protocol, may result in a retransmission at a later time.

    Evaluate the Data

    • Does the physical NIC's reported speed and duplex setting match the  expectation of the hardware?  Hardware connectivity issues may result in  a NIC autonegotiating to a lower speed or half duplex mode.
    • Is there a significant load on the appropriate network items?  For  instance, is a network-intensive load in a guest actually generating the  network activity on its vNIC that is expected?  Are storage-intensive  loads generating traffic on the vNIC or vmkNIC when the hypervisor or  guest initiators are used?
    • Verify that the network traffic is flowing on appropriate NICs. A  typical ESX host may have network traffic generated by VMs, network  traffic from iSCSI protocol, VMotion related network traffic and service  console associated network activity. It is recommended to have to  separate NICs to handle these different network packets.
    • During periods of saturation, is the total throughput (MbTX/s summed  with MbRX/s) matching expectations?  Either the guest or the other end  of the communication link may be throttling the performance.
    • Are packets being dropped?  When overworked the hardware will refuse  packets which get reported as dropped transmitted (%DRPTX) and received  (%DRPRX) packets.

     

    Correct the System

    • Make sure that the hardware is configured to run at its maximum  capability.  This means verifying that 1 Gb NICs are not autonegotiating  down to 100 Mb/s for having been connected to an older switch.   Similarly, ensure that NICs are running in full duplex mode.
    • When network throughput seems lower than expected, apply traditional  network diagnosis techniques to investigate every link in the  connection.  Low throughput at the ESX Server is not necessarily due to  server configuration.
    • Verify that VMware Tools is installed on the guests and TSO, Jumbo Frames, and 10 Gb Ethernet are enabled, where possible.
    • Bond multiple physical NICs to virtual switches with high utilization.
    • Provide separate virtual switches their own physical NICs and separate network-intensive VMs on their own vSwitches.
    • If VMs running on the same ESX Server communicate with each other,  connect them to a dedicated virtual switch so that all network transfers  occur in memory and not packets are shipped over the wire.

     

    VMFS and RDM Considerations

    ESX Server supports the mapping of physical LUNs to virtual machines via  a method called raw device mapping (RDM).  RDM eliminates VMFS from the  stack which is incorrectly believed to be a source of performance  problems.  Removing VMFS reduces the total number of addressable LUNs,  eliminates the ability to perform storage migrations (storage VMotion),  and greatly increases the effort required for simplified maintenance  activities provided by site recovery manager.  And the performance  benefits derived from the removal of VMFS are negligible.

    See the performance characteristics of VMFS and RDM whitepaper for more information on this subject.

    Resources

    The top-level performance analysis page: Performance Monitoring and Analysis

    VirtualCenter performance counters: Understanding VirtualCenter Performance Statistics

    esxtop performance counters: esxtop Performance Counters