0 Replies Latest reply: Feb 26, 2015 9:29 AM by Daniel73546 RSS

    Linux guest E1000 nic generates abnormally high volumes of traffic in a NAT and traffic-shaped environment

    Daniel73546 Lurker

      This is mainly FYI, to document this behavior in case other people experience it as well:

       

       

      Situation:

        ESXi 4.1.0 build 502767

        Linux guest VM

          The guest VM NATs about 30% of the traffic going through it.

          The guest VM uses the E1000 nics

          Traffic downstream (on the "inside" interface) is traffic-shaped (by another device)

       

       

      Observations:

        In this situation, the linux guest will report abnormally high downstream NIC utilization (considerably more than the inputs on the other interfaces), when the traffic is shaped.

        I suspect this issue is E1000-specific (i.e. not VMware per-se), but I'm reporting it here because this is where I experienced it.

       

       

        This particular guest NIC was mapped to a single VMware ESX host NIC, with no other guest VMs attached to this same NIC.

              When I looked at traffic stats on the switch port that this host nic plugged into, it did not report the abnormally high utilization patterns observed inside of the guest (the switch reported 100-200Mbps less traffic).

              The VMware vSphere client reported the same traffic levels as the physical switch port.  (In other words, this looks to be a VM guest driver issue.)

       

        When this problem was happening, the tx traffic levels reported by the E1000 NIC were about 30% higher than would be expected from the volume of traffic coming in from the outside NICs (the tx traffic level of this NIC should have been very close to the sum of the rx traffic levels received by the other NICs).

       

       

        Rebooting the linux guest did not resolve the problem.

        Changing the linux guest NICs (all four of them) from E1000 to VMXNET3 actually did resolve the problem.

       

        I do not know if there is a correlation between the two 30% numbers, but based on observations of similar behavior in one other (non-VMware) setting/situation, I suspect it's just a conincidence.