e1000 & vmxnet broken when using multiple subnets

VMware Cloud Community

OK I have ESXi4 in a box (Athlon II 4200+, nvidia 61xx chipset (Asus M2NPV-v), SATA drives

vmnic0 - forcedeth - nvidia

Vmware management Interface 192.168.a.b/24

vmnic1 e1000e - Intel CT gigabit PCIe (esxcfg-module -s IntMode=1 - disable MSI-X as it prevents card from seeing a cable connected)

vmnic e1000 - Intel GT gigabit PCI

I have installed

Ubuntu 9.10 - no vmware tools 192.168.a.c/24

ClearOS (Centos 5.4 distrib Gateway Mode)

PROBLEM

The ClearOS acts as a router forwarding WAN traffic from eth0 to eth1, providing caching DNS etc.

The WAN connection is in this case an ADSL2+ 18/1.2 Mbps connection going via another hardware ADSL ModemRouter. The WAN connection is all on subnet 192.168.d.0

The WAN is physically connnected to vmnic1 - Only ClearOS has access to vmnic1 but it is just run as a switch with "flexible" network adapter under VM

The LAN is physically connected via Netgear Gigabit simple switch - I am using a WIndows 7 machine.

Basically the setup works fine when I have not installed vmware tools (including vmxnet). Once I install the tools and activate the vmxnet driver, the ClearOS system routes so that I am unable to access web pages or email, although bizarrely SSH works fine presumaby because of the low level f traffic.

Its not a CPU issue. All performance stats show <100MHz in use

The ClearOS box is able to route for the Ubuntu VM on the same machine, both on the same vmnic or another one.Running Firefox and Bittorrent on th ubuntu box (accessed via the Vmware console ) is fine. Note that all this traffic is via the same subnet and in fact via the forcedth driver.

If I switch the LAN interface between the vmnic0 (forcedeth) and vmnic2 (e1000) then the system works fine on vmnic, and not on vmnic2.

Clearly there is an interaction on vmxnet, e1000, CentOS rendering traffic so slow that the system is unusable, as vmxnet,focedeth,CentOS and pcnet32,e1000,CentOS both work.

Other combinations I have not tried are

a) swapping vmnic1/2 roles around to see if its unique to e1000 rather than e1000e

b) rebuilding the VM to used other than the "flecible" adapter settings

c) Other versions of ESXI other than 4.0

Cheers

Rajiv

4 Replies

Try turning off TCO on the VM's NIC that might help.

Sorry thats too brief for me to understand

TCO ??

and at what level (in VMware, in the GuestOS, VMware Command Line)

Cheers

Rajiv

Sorry I meant TSO/GSO/TX offloading and it is within the Guest VM

Results of ethtool -k on working forcedth

Offload parameters for eth0:

Cannot get device rx csum settings: Operation not supported

Cannot get device tx csum settings: Operation not supported

Cannot get device udp large send offload settings: Operation not supported

rx-checksumming: off

tx-checksumming: off

scatter-gather: on

tcp segmentation offload: on

udp fragmentation offload: off

generic segmentation offload: off

generic-receive-offload: off

I then followed a different post suggesting that if one adds the e1000 as an e1000 rather than flexible in the vmconfiguration it works. Lo and behold it works, new ethtool results for thee1000 card apologies for not having the e1000 card as a vmxnet results

Cannot get device udp large send offload settings: Operation not supported

rx-checksumming: on

tx-checksumming: on

scatter-gather: on

tcp segmentation offload: on

udp fragmentation offload: off

generic segmentation offload: off

generic-receive-offload: off

So it would seem that while offloading may be the root cause, its some driver issue as forcing the VM to use the e1000 driver (for an e1000) card rather than vmxnet seems to fix the issue.

This is a great relief to me as for low end VM boxes, a £20 ethernet card is much more appropriate, although I suppose the £60 server cards are not so excessive.

Cheers

Rajiv