VMware Cloud Community
thecoffeeguy
Contributor
Contributor

Loosing network connectivity on some VM's

I am not really sure what happened, but I came in this morning and I have a couple of VM's that are having network problems. All are Windows 2003 and have a myriad of weird issues:

-some can not ping the gateway

-some cannot be RDP into

-can ping them, but not reach them

I am looking at the seutp and everything appears to be ok. They have a NIC connected to a virtual switch. I am little at a loss here and why they are having problems with network connectivity, when everything is saying they should not.

Anyone have any suggestions on where I should begin?

Thanks.

TCG

EDIT: One thing I just noticed (never noticed before) is when I am looking at the Network Adaptors under Configuration, one of the NIC's that is on the vswitch where this one particular VM is,one of the NICs has a completely different "Networks" than the other NICS. I mean,192.168.80 is the odd one, where the others are 172.16.141.x

Could that be the issue?

Reply
0 Kudos
7 Replies
Milton21
Hot Shot
Hot Shot

Did HA move the systems?

How many systems are running on ESX host? May need more Virtual Ports.

Reply
0 Kudos
thecoffeeguy
Contributor
Contributor

To my knowledge, HA did not move the systems. Everything is "in the Green" looking at things right now.

I have 3 ESX hosts right now, with 16 VM's right now. I am getting close to topping out on this particular cluster though. Memory is being utilized around 70% on all ESX hosts.

Reply
0 Kudos
RenaudL
Hot Shot
Hot Shot

EDIT: One thing I just noticed (never noticed before) is when I am looking at the Network Adaptors under Configuration, one of the NIC's that is on the vswitch where this one particular VM is,one of the NICs has a completely different "Networks" than the other NICS. I mean,192.168.80 is the odd one, where the others are 172.16.141.x

Could that be the issue?

It looks like it: somebody may have messed with your VLAN setup / cabling on this system without your knowledge. If you're using IP-based load-balancing, then you'll certainly get an erratic networking behavior because some packets will get forwarded to the wrong LAN while others are sent on the right one. That explains everything.

thecoffeeguy
Contributor
Contributor

Looks like you were right. I just looked at the wiring, and one of the NIC's (One in 192.168.80.x range) is plugged into the wrong subnet.

Just out of curiosity, since that physical NIC is plugged into that subnet, how does the NIC within ESX get that IP address range? That make sense?

Reply
0 Kudos
RenaudL
Hot Shot
Hot Shot

We just sample the IP headers of some incoming packets in order to guess what the subnet may be. This is straightforward & efficient in this case because you could easily have wasted hours trying to find an error in your ESX configuration Smiley Happy

ESX 3.5 has CDP support which eases problem identification even more.

Reply
0 Kudos
cgarrett
Contributor
Contributor

We had a similar issue with a client using blade servers with integrated ethernet switches in their blade chassis.

They had connected uplinks to some (but not all) of these integrated switches, due to core switch port constraints. They had then configured one vswitch with multiple NICs, including one that was connected to a physical switch with no uplink.

Because they had left the Network Failure Detection setting on its default of "Link Status Only", the load balancing algorithm of ESX would attempt to send packets via the "un-Uplinked" NIC. As the NIC was connected to the integrated switch, the link status was okay, but as there was no uplink, the sessions would fail, and the VMs would intermittently lose communications.

The problem was fixed easily enough by disabling those NICs that were connected to a "dead-end" switch. In your case, I'd disable the NIC connected to the 192.168.80 segment until you work out why that change had occured.

CG

Reply
0 Kudos
BenLe
Enthusiast
Enthusiast

There might be several thing to verify:

The basic things are:

- Are the NICs of the vSwitch connected to the same switch?

- Are the switch ports configured for the same VLAN?

- Is the speed/duplex settings correct for each NIC and port?

Then you can further look into details:

- Do you use a "port security" feature on the switch(es)?

This can prevent the load balancing of the vSwitch to work by causing errors on the switch if a MAC address is seen on various ports.

- Are you using multiple physical switches?

Then you should investigate if the ARP table is refreshed quickly enough to allow that only one physical switch will know the MAC/IP combination.

Cheers,

Ben

Reply
0 Kudos