I saw the specs on this very new nic from Intel and thought, "Aha, just the perfect nic to add to our ESX servers."
I checked that it's covered in the igb driver, and went ahead and ordered it from Dell.
So, I slapped the card into one of our reserve ESX hosts. The card was detected properly and the 4 new vmnics showed up. I even added them to the vswitch, and everything looked like it was working... until I realized that the virtual machines on that host became unreachable--packets were being dropped left and right (note: SOME go through, but very few). Oddly enough, the service console (on the same vswitch) is reachable just fine.
I removed all the nics from the vswitch until it was just down to a single port on the 1000VT. The problem persisted. At this point, I'm thinking there's something wrong with the 1000VT driver, as the broadcom nic, with the exact same settings, both on the esx side and switch side, works just fine.
Just as a note, we do use VLANs, and and the ports on the switch (cisco 494810GE) are configured as trunks. If the VM is put on VLAN1, it works fine. Any other VLAN, and it doesn't work. On the broadcom, it works, regardless of what VLAN I throw the VM onto. The service console is on VLAN1 as well, FYI.
Has anyone else run into something similar? I saw another post about a Dell OEM card acting differently than a retail one
could this be another instance of that? I know Intel has a much newer version of the igb driverwould compiling that and loading the module potentially fix the problem?
Thanks in advance!
Thanks for the pointer--is there an ETA on the patch? Also, is there any workaround in the meantime? By the fact that the VMs seem to get traffic just fine if put in the native VLAN, it would seem to me a general VLAN support problem with the driver... Would building the current driver from source help (or rather, will an unmodified driver work in ESX?)?
Thanks for the pointer--is there an ETA on the patch?
I am not aware of any date (and they always fluctuate anyway) but I can guarantee we're working hard in order to release this fix as soon as possible.
Also, is there any workaround in the meantime?
People in the other thread already submitted some (constraining, I agree) workarounds. I'm afraid there is no easy and effective way around this issue for now 😕
By the fact that the VMs seem to get traffic just fine if put in the native VLAN, it would seem to me a general VLAN support problem with the driver... Would building the current driver from source help (or rather, will an unmodified driver work in ESX?)?
That is indeed a bug in the code handling VLANs. You won't be able to compile Intel's driver and use it on ESX, I'm sorry.
Are you setting the VLAN value in the ESX config?
I did a new install of 4 ESX servers (3.5) this week and have not seen the problem that you describe, but I have only setup a few test VM's at this point. I'm not using the ESX VLAN value and instead I'm setting the VLAN value on my Cisco switch. In my environment this works becuase all of the production VM's will exist in the same VLAN.
I guess I'm just wanting to make sure that this is only affect people that are trying to set the VLAN value from with the vSwitch settings.
This issue seems to occur ONLY when the virtual machines are assigned to a non-native VLAN of a trunk when you use a trunk connection to a switch. If you are assigning the ESX machine to a single vlan like you are, I believe it works fine.
And sorry I forgot to mention, we use 3.5 as well.
This issue has been driving me nuts for the past couple days and I'm glad that I'm not the only one and that a patch is in the works. Like the original poster I saw that the new Intel 1000/VT quad port NIC's would be a good fit in our new ESX 3.5 environment since this was the latest card from Intel...which did make the I/O compatibility list on page 27 for Intel Networking Devices. What it fails to mention is that VST mode isn't supported. It also doesn't mention it in the esx3_vlan_wp.pdf where it states "All of the Intel and Broadcom NIC controllers support both VST mode and VGT mode..."
We have three ESX 3.5 servers and one ESX 3.02 server. I thought at first it was a version or setup issue since the problem only occurred on the 3.5 environment but after more testing it came down to the igb driver. This issue impacts any NIC's using the igb driver and VST ID's.
VMware said that this driver should be fixed in the February patch release. Bug# 223143 . After a lot of frustration I found the KB today when I finally figured out that it was the NIC's and drivers that were different between our two environments. . I guess I didn't bother to check here since I assumed VMware's HCL and VLAN whitepapers were accurate at the time I purchased the VT NIC's. I wish I checked here sooner....
Seems like the March 6th patch has it (despite not mentioning it in the patches summary) :
ESX-1003515 10:54:11 03/10/08 Fix for trunking VLANs with igb driver
and it seems to work