VMware Cloud Community
mekoz
Enthusiast
Enthusiast

VMotion + Linux = NIC drop

I've been having a lot of trouble with only Linux VMs dropping their network either completely or running in a very degraded state post VMotion. It doesn't happen all the time and it only happens to Linux guests. I'm running 64-bit RHEL4 and just can't get to the bottom of it. It seems to happen on 3.0.1 and 3.5u2 ESX hosts and I'm using the e1000 driver. Any thoughts? I'd appreciate any/all ideas either for whats going on or how to further troubleshoot. I'm fairly familiar with Linux and from the OS and ESX point of view everything looks fine (the VMotion completes in a short amount of time and is successful).

Thanks!!!

0 Kudos
7 Replies
Aladen
Enthusiast
Enthusiast

Check the number of virtual machines on your esx server, in particular the number of nics connected to the vswitch. We had a case like this where the issue was we were running out of ports on the vswitch. Added ports, rebooted esx, and no more problems.

Also what kernel rev are you running.

0 Kudos
espi3030
Expert
Expert

I am running into the same issue. We are running ESX 3.0.2 (106395) in the process of applying latest patches (after some testing) to bring all hosts to (117737). All my ESX hosts run no more than 10 VM's at any given time, the vSwitches are all configured with 120 ports, should I increase that?

I apologize for jumping into this thread, I will gladly open a new one and award points accordingly. Thank you.

espi3030

0 Kudos
mekoz
Enthusiast
Enthusiast

Let's keep it under the same thread - sounds like the same issue. I've had some trouble reproducing it at will, so I just wrote a Perl script that will VMotion a Linux VM until it loses ping. Have you observed any other anomolies post-VMotion (like loss of console, extreme slow down if/when the console comes back)?

0 Kudos
espi3030
Expert
Expert

I have a clarification, not sure if it is droping the NIC after vmotion. I knew for a fact it is droping after reverting from snapshot, should that be expected? Surely not. I will monitor my environment closer to narrow the problem down to vmotion or snaphots.

Thank you.

0 Kudos
Aladen
Enthusiast
Enthusiast

there is this thread that may be a similar problem.

http://communities.vmware.com/thread/90510

no answer though

0 Kudos
jesse_gardner
Enthusiast
Enthusiast

mekoz, did you ever get to the bottom of this? We're running into the same situation, occasionally a linux VM stops responding to ping after a VMotion.

3.5 U3

RHEL 4 and 5, 32 and 64-bit.

Flexible, E1000, and Enhanced Vmxnet NICs.

0 Kudos
Texiwill
Leadership
Leadership

Hello,

If you have issues reverting from snapshot then there is an issue with the hypervisor perhaps being overloaded. Snapshots that are large or high disk IO issues have cause networking issues. IN general you want to commit or remove snapshots instead of keeping them around. Considering you have to read through the snapshot layers to the vmdk. When there are lots of layers and such you can have issues with per VM performance.

You may wish to look at ESXTOP output and http://communities.vmware.com/docs/DOC-9279 for interpreting it.


Best regards,

Edward L. Haletky VMware Communities User Moderator, VMware vExpert 2009, Virtualization Practice Analyst[/url]
Now Available: 'VMware vSphere(TM) and Virtual Infrastructure Security: Securing the Virtual Environment'[/url]
Also available 'VMWare ESX Server in the Enterprise'[/url]
[url=http://www.astroarch.com/wiki/index.php/Blog_Roll]SearchVMware Pro[/url]|Blue Gears[/url]|Top Virtualization Security Links[/url]|Virtualization Security Round Table Podcast[/url]

--
Edward L. Haletky
vExpert XIV: 2009-2023,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos