Hi,
ESXi 4.1.0,260247, on Intel S5520HC board with onboard dual Intel 82575EB adapters
Have a few VMs running for a month without issues and somehow one of the VMs stopped talking to the default gateway and the ESXi host. The VM however can still ping the other running VMs. After rebooting the VM a few times the problem went away.
A few days later another VM experienced the same thing. This time rebooted the ESXi host and only one of the four VMs came back alive (can ping gateway and our workstations can ping it). The rest of the VMs after a few reboots came back and on a workstation I have ping -t to keep these VMs alive (works somehow). Just one last VM still cannot ping the ESXi host nor the default gateway. Can still ping its VM peers.
Tried to bring in another VM running from a VMware Server 2 (via converter), also cannot connect to the network outside the host.
Googled for a while and found some posts that talks about arp problems or problem with the host network adapters. I am afraid to reboot the host again as my VMs may not come back online.
I have an Intel PCI NIC (82559) I can possibly install in the host, but I am afraid to turn off the host and cannot get the VMs up after I start the host again.
Additional info
VM 1 (CentOS) - working - 192.168.20.216
VM 2 (W2008) - working - 192.168.20.212
VM 3 (W2003) - working - 192.168.20.207
VM 4 (CentOS) - not working - 192.168.20.149
ESXi host - 192.168.20.218
default gateway - 192.168.20.254
Adapter Details:
Name: vmnic1
Location: PCI 01:00,1
Driver: igb
Networks: 192.168.20.64-192.168.20.127
Please advise,
JC
Welcome to the Community - If you are able to ping between the VMs with no errors I think the ESXi networking is functioning - you say you have two physical NICs on your host - do you have two virtual switches configures one for each NIC or a single virtual switch with a NIC Team? If you have a NIC Team how is the load balancing configured p IP Hash or Port Based? If IP Hash make sure your physical switch is configured for LACP or Etehr Channel in Cisco speak
Hi David thanks for the response.
It's probably not the physical NIC port because the VMs that are working, they can ping other workstations on the network and also hit internet sites, all through this physical NIC port.
Thanks,
Joe
Joe - good point - what about the configuration of the virtual switch - how do you have it configured?
geediu wrote:
Hi David thanks for the response.
It's probably not the physical NIC port because the VMs that are working, they can ping other workstations on the network and also hit internet sites, all through this physical NIC port.
Thanks,
Joe
Joe,
This sounds like an issue with at the VM's network level.
What does the current network settings on the guest VM look like?
Can you post an "ifconfig -a" of the non-working VM, a working VM and a "esxcfg-vmknic -l" of the host? Please mask out anything that you don't want to make public.
ifconfig -a for the VM that doesn't work:
ifconfig -a for the VM that works:
and for the host, esxcfg-vmknic -l:
Interface Port Group/DVPort IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type
vmk0 Management Network IPv4 192.168.20.218 255.255.255.0 192.168.20.255 00:15:17:e9:9a:d6 1500 65535 true STATIC
If everything works ok when VM4 is off it sounds like it is messing with your network. Did you build VM4 from scratch?
You could try adding a new NIC to VM4 and disable/remove the existing NICs. Give it a new IP if you can and power it on.
Also check your phsycial switch to make sure it is not doing anything out of the ordinary.
geediu wrote:
Switch Name Num Ports Used Ports Configured Ports MTU UplinksvSwitch1 128 5 128 1500 vmnic1PortGroup Name VLAN ID Used Ports UplinksVM Network 2 4095 3 vmnic1I just noticed the VLAND ID.
Are you using VLANs in your environment? If not change the VLAND id to 0 instead of 4095. ID 4095 is special and is used when you want the guest to deal with the VLAN tagging.
Hi Robert,
I already tried adding new NIC to VM4 as you can probably notice it is eth1 because I thought something was wrong with eth0. I set DHCP and it wasn't even able to pick up an IP.
As for your other response. We do have VLAN running and the 192.168.20.0 subnet belongs to VLAN 10.
I saw eth1 but figure I would ask/suggest it anyway.
Is there a reason why you are using 4095 instead of vlan10?
Is your VM using an e1000 NIC?
I tried setting up ping -t to those VMs on my machine as connectivity test.
I went to change VLAN from 4095 to 10 on the vswitch - the VMs immediately lose connectivity (ping time out).
I changed back from VLAN 10 to VLAN 4095 on the vswitch - my pings receive replies again.
Yes I tried VMNET2, VMNET3, E0000 before, all the same result. currently all VMs mentioned are using VMNET3.
Any comments regarding what I said earlier about arp table problems and the onboard Intel NIC issues I read about from Google searches?
Thanks for your help thus far.
Joe
Both Intel 82575EB and 82559 are on the HCL for ESXi 4.1. Your MOBO is also on the HCL.
I did a quick google and found some VMware related issues with the NIC - seems to be a driver issue? These were all from '08:
http://communities.vmware.com/thread/173597?start=15&tstart=0
http://communities.vmware.com/thread/173549
You could try swapping out the NIC or updating the drivers (if an update exist) to see if that resolves the issue.
Driver update: http://bitbud.com/2010/05/12/updating-network-drivers-on-vmware-esxi/
Check driver: Determining NIC firmware and driver version in ESX/ESXi 4.x
http://downloads.vmware.com/d/details/esx_esxi40_intel_82575_82576_dt/ZHcqYmR0QGpidGR3
I think that you should also check into your VLAN configuration. If you are are not tagging VLAN10 on the pSwitch port that the host is plugged into then the vSwitch portgroup vlan id should be set to 0.
I say if you do not have a good reason for VGT avoid it. Here are some good VMware KB articles on VLAN/VGT:
Sample configuration of virtual machine(VM) VLAN Tagging (VGT Mode) in ESX
Sample configuration of virtual switch VLAN tagging (VST Mode)
P.S by swaping out the NIC I meant install and use the 82559.
Message was edited by: Walfordr correct typo
http://www.ewams.net/?view=upgradingvsphere4nicdrivers
For step 5. You would grep igb instead of bnx2x. The igb is the driver for the intel NICs.
i.e:
~ # esxupdate query --vib-view |grep igb
deb_vmware-esx-drivers-net-igb_400.1.3.19.12.2-2vmw.1.4.348481 installed 2011-01-13T01:34:35+00:00
~ #
Hi Robert,
I ended up checking the pswitch port and matched the port and VLAN settings with ports used by other physical servers. All is well now. Thanks again.
Joe
I have just installed ESXi 5.1, then i changed the vlan ID of Management Network from all to 10, but when i clicked OK my vsphere client stopped responding and now i am unable to access or ping the server. Please can anybody help me how to resolve this issue
Thanks
I have just installed ESXi 5.1, then i changed the vlan ID of Management Network from all to 10, but when i clicked OK my vsphere client stopped responding and now i am unable to access or ping the server. Please can anybody help me how to resolve this issue
Thanks