VMware Cloud Community
mrman5919
Contributor
Contributor

Random VM Network dropouts

I have a new install of two identical PowerEdge R710 servers running ESX 4.0 Classic. I have two nics on a vSwitch with multiple VM Networks on that switch utilizing 802.1q tagging for VLANs. the problem I have is that I can have two machines on one of those boxes that use the same VLAN (lets say VLAN 2). At some unknown point in time one will stop responding to network traffic from the outside. I can see the 2003 hosts attempting to send traffic out when I use the console from VI client, but nothing works. If I vMotion that server to the other box, 95% of the time, it will resume working as normal. So far this has happened with 3 different VLANs (our 3 most used) and I can't seem to find any common activities that lead to it. Any ideas? Luckily we don't have many prodcution VMs on this cluster yet, but we were in the process of adding a few dozen production guests and now we can't.

Reply
0 Kudos
5 Replies
Rumple
Virtuoso
Virtuoso

what network switches do you have and how are they configured (etherchannel or lacp) or vendor equivalent...

Reply
0 Kudos
jbogardus
Hot Shot
Hot Shot

Consider if Port Security is set on the physical switch ports.

This is some information on how this behaves for Cisco switches:

kjb007
Immortal
Immortal

Sounds like ARP timeout. Can you validate before the issue occurs that you can see the MAC address of your vm's on your switch, and make sure it is not flapping between ports? When the error occurs, see if there's an arp entry on your switch. port security and arp ageing would be a good place to start looking on the switch side as mentioned by another poster.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
mrman5919
Contributor
Contributor

Thanks for the reply from all three of you. The ports in question are on

Catalyst 6500s that are still running CatOS due to some legacy

equipment. They are set to vlan trunking with dot1q enabled on all 4

ports in this case (command is set trunk mod/port on dot1q). No etherchannel since they are split between the two switches for redundancy. I have a third ESX server that was is also running

vSphere 4 that was a clean install from 3.5. It never had these

problems on 3.5 and I don't have a way to directly test it on that box.

I've triple checked and port security is disabled on these ports. I have also been watching the console and haven't seen any flapping on the ports yet.

As soon as the next one drops, I'll take a look at ARP and see if there is something to that.

Reply
0 Kudos
mrman5919
Contributor
Contributor

Well I was able to find my answer. It turned out that the trunk ports were working properly, but the machines that were on the trunk's native vlan weren't accepting the tagging. Apparently both servers had different native vlans which is why one would work fine on one box but not the other. Thanks for the help guys.

Reply
0 Kudos