VMware Cloud Community
geediu201110141
Contributor
Contributor

VM cannot ping default gateway nor ESXi host

Hi,

ESXi 4.1.0,260247, on Intel S5520HC board with onboard dual Intel 82575EB adapters

Have a few VMs running for a month without issues and somehow one of the VMs stopped talking to the default gateway and the ESXi host. The VM however can still ping the other running VMs. After rebooting the VM a few times the problem went away.

A few days later another VM experienced the same thing. This time rebooted the ESXi host and only one of the four VMs came back alive (can ping gateway and our workstations can ping it). The rest of the VMs after a few reboots came back and on a workstation I have ping -t to keep these VMs alive (works somehow). Just one last VM still cannot ping the ESXi host nor the default gateway. Can still ping its VM peers. 

Tried to bring in another VM running from a VMware Server 2 (via converter), also cannot connect to the network outside the host.

Googled for a while and found some posts that talks about arp problems or problem with the host network adapters.  I am afraid to reboot the host again as my VMs may not come back online.

I have an Intel PCI NIC (82559) I can possibly install in the host, but I am afraid to turn off the host and cannot get the VMs up after I start the host again.

Additional info

VM 1 (CentOS) - working - 192.168.20.216

VM 2 (W2008) - working - 192.168.20.212

VM 3 (W2003) - working - 192.168.20.207

VM 4 (CentOS) - not working  - 192.168.20.149

ESXi host - 192.168.20.218

default gateway  - 192.168.20.254

Adapter Details:

Name: vmnic1

Location: PCI 01:00,1

Driver: igb

Networks: 192.168.20.64-192.168.20.127

Please advise,

JC

Reply
0 Kudos
16 Replies
weinstein5
Immortal
Immortal

Welcome to the Community - If you are able to ping between the VMs with no errors I think the ESXi networking is functioning - you say you have two physical NICs on your host - do you have two virtual switches configures one for each NIC or a single virtual switch with a NIC Team? If you have a NIC Team how is the load balancing configured p IP Hash or Port Based? If IP Hash make sure your physical switch is configured for LACP or Etehr Channel in Cisco speak

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Reply
0 Kudos
geediu201110141
Contributor
Contributor

Hi David thanks for the response.

It's probably not the physical NIC port because the VMs that are working, they can ping other workstations on the network and also hit internet sites, all through this physical NIC port.

Thanks,

Joe

Reply
0 Kudos
weinstein5
Immortal
Immortal

Joe - good point - what about the configuration of the virtual switch - how do you have it configured?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
Reply
0 Kudos
geediu201110141
Contributor
Contributor


Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch1         128         5           128               1500    vmnic1
  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VM Network 2          4095     3           vmnic1
Reply
0 Kudos
Walfordr
Expert
Expert

geediu wrote:

Hi David thanks for the response.

It's probably not the physical NIC port because the VMs that are working, they can ping other workstations on the network and also hit internet sites, all through this physical NIC port.

Thanks,

Joe

Joe,

This sounds like an issue with at the VM's network level.

What does the current network settings on the guest VM look like?

Can you post an "ifconfig -a" of the non-working VM, a working VM and a "esxcfg-vmknic -l" of the host?  Please mask out anything that you don't want to make public.

Robert -- BSIT, VCP3/VCP4, A+, MCP (Wow I haven't updated my profile since 4.1 days) -- Please consider awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
geediu201110141
Contributor
Contributor

ifconfig -a for the VM that doesn't work:

eth1      Link encap:Ethernet  HWaddr 00:0C:29:EB:F0:86
          inet addr:192.168.20.149  Bcast:192.168.20.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:feeb:f086/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:109 errors:0 dropped:0 overruns:0 frame:0
          TX packets:113 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8828 (8.6 KiB)  TX bytes:11068 (10.8 KiB)

ifconfig -a for the VM that works:

eth0      Link encap:Ethernet  HWaddr 00:0C:29:07:69:1.7
          inet addr:192.168.20.216  Bcast:192.168.20.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe07:6917/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:26877701 errors:0 dropped:0 overruns:0 frame:0
          TX packets:20041814 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8840033482 (8.2 GiB)  TX bytes:5535708019 (5.1 GiB)

and for the host, esxcfg-vmknic -l:

Interface  Port Group/DVPort   IP Family IP Address     Netmask         Broadcast       MAC Address       MTU     TSO MSS   Enabled Type

vmk0       Management Network  IPv4      192.168.20.218 255.255.255.0   192.168.20.255  00:15:17:e9:9a:d6 1500     65535     true    STATIC

and something ultra bizzare just happened - about half an hour ago when I wanted to respond to this post, I was going to PuTTY into the ESXi host and I couldn't connect. I tried to ping and it won't let me. I then tried to ping the VM that doesn't work (192.168.20.149) and I could ping......?!!?!? so PuTTY into it instead and powered it off. About about 5 minutes after I powered off that VM, I was able to ping my ESXi host again and subsequently able to PuTTY in as I needed to run the esxcfg-vmknic command. I powered the problematic VM, and sure enough I cannot ping it.
What is going on.....?!?!

Reply
0 Kudos
Walfordr
Expert
Expert

If everything works ok when VM4 is off it sounds like it is messing with your network. Did you build VM4 from scratch?

You could try adding a new NIC to VM4 and disable/remove the existing NICs.  Give it a new IP if you can and power it on.

Also check your phsycial switch to make sure it is not doing anything out of the ordinary.

Robert -- BSIT, VCP3/VCP4, A+, MCP (Wow I haven't updated my profile since 4.1 days) -- Please consider awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
Walfordr
Expert
Expert

geediu wrote:


Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks
vSwitch1         128         5           128               1500    vmnic1
  PortGroup Name        VLAN ID  Used Ports  Uplinks
  VM Network 2          4095     3           vmnic1

I just noticed the VLAND ID.

Are you using VLANs in your environment?  If not change the VLAND id to 0 instead of 4095.  ID 4095 is special and is used when you want the guest to deal with the VLAN tagging.

Robert -- BSIT, VCP3/VCP4, A+, MCP (Wow I haven't updated my profile since 4.1 days) -- Please consider awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
geediu201110141
Contributor
Contributor

Hi Robert,

I already tried adding new NIC to VM4 as you can probably notice it is eth1 because I thought something was wrong with eth0. I set DHCP and it wasn't even able to pick up an IP.

As for your other response. We do have VLAN running and the 192.168.20.0 subnet belongs to VLAN 10.

Reply
0 Kudos
Walfordr
Expert
Expert

I saw eth1 but figure I would ask/suggest it anyway.

Is there a reason why you are using 4095 instead of vlan10?

Is your VM using an e1000 NIC?

Robert -- BSIT, VCP3/VCP4, A+, MCP (Wow I haven't updated my profile since 4.1 days) -- Please consider awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
geediu201110141
Contributor
Contributor

I tried setting up ping -t to those VMs on my machine as connectivity test.

I went to change VLAN from 4095 to 10 on the vswitch - the VMs immediately lose connectivity (ping time out).

I changed back from VLAN 10 to VLAN 4095 on the vswitch - my pings receive replies again.

Yes I tried VMNET2, VMNET3, E0000 before, all the same result. currently all VMs mentioned are using VMNET3.

Any comments regarding what I said earlier about arp table problems and the onboard Intel NIC issues I read about from Google searches?

Thanks for your help thus far.

Joe

Reply
0 Kudos
Walfordr
Expert
Expert

Both Intel 82575EB and 82559 are on the HCL for ESXi 4.1. Your MOBO is also on the HCL.

I did a quick google and found some VMware related issues with the NIC - seems to be a driver issue? These were all from '08:
http://communities.vmware.com/thread/173597?start=15&tstart=0
http://communities.vmware.com/thread/173549

You could try swapping out the NIC or updating the drivers (if an update exist) to see if that resolves the issue. 
Driver update: http://bitbud.com/2010/05/12/updating-network-drivers-on-vmware-esxi/
Check driver: Determining NIC firmware and driver version in ESX/ESXi 4.x

http://downloads.vmware.com/d/details/esx_esxi40_intel_82575_82576_dt/ZHcqYmR0QGpidGR3


I think that you should also check into your VLAN configuration.  If you are are not tagging VLAN10 on the pSwitch port that the host is plugged into then the vSwitch portgroup vlan id should be set to 0.

I say if you do not have a good reason for VGT avoid it.  Here are some good VMware KB articles on VLAN/VGT:

Sample configuration of virtual machine(VM) VLAN Tagging (VGT Mode)  in ESX

Sample configuration of virtual switch VLAN tagging (VST Mode)

P.S by swaping out the NIC I meant install and use the 82559.

Message was edited by: Walfordr correct typo

Robert -- BSIT, VCP3/VCP4, A+, MCP (Wow I haven't updated my profile since 4.1 days) -- Please consider awarding points for "helpful" and/or "correct" answers.
Walfordr
Expert
Expert

http://www.ewams.net/?view=upgradingvsphere4nicdrivers

For step 5. You would grep igb instead of bnx2x.  The igb is the driver for  the intel NICs.

i.e:

~ # esxupdate query --vib-view |grep igb

deb_vmware-esx-drivers-net-igb_400.1.3.19.12.2-2vmw.1.4.348481             installed     2011-01-13T01:34:35+00:00

~ #

Robert -- BSIT, VCP3/VCP4, A+, MCP (Wow I haven't updated my profile since 4.1 days) -- Please consider awarding points for "helpful" and/or "correct" answers.
Reply
0 Kudos
geediu201110141
Contributor
Contributor

Hi Robert,

I ended up checking the pswitch port and matched the port and VLAN settings with ports used by other physical servers. All is well now. Thanks again.

Joe

Reply
0 Kudos
wajidmalik
Contributor
Contributor

I have just installed ESXi 5.1, then i changed the vlan ID of Management Network from all to 10, but when i clicked OK my vsphere client stopped responding and now i am unable to access or ping the server. Please can anybody help me how to resolve this issue

Thanks

Reply
0 Kudos
wajidmalik
Contributor
Contributor

I have just installed ESXi 5.1, then i changed the vlan ID of Management Network from all to 10, but when i clicked OK my vsphere client stopped responding and now i am unable to access or ping the server. Please can anybody help me how to resolve this issue

Thanks

Reply
0 Kudos