VMware Cloud Community
edgrigson
Enthusiast
Enthusiast

VM causing ESX host to become isolated

We have four ESX 3.5U4 hosts setup in a cluster with HA and DRS configured. This cluster has been running fine for the last couple of weeks (it's only in test at present) with only a handful of VMs running on it. Today we did a V2V of a VMWare Server image (Windows 2003) which completed fine, but when we boot this VM the ESX host becomes isolated from the network. So one faulty VM is taking out the entire host! If we leave DRS on Fully Automatic the VM is then moved to another host which then becomes isolated too etc. Not good!

Logging onto the local console of an isolated host, it looks as if the ESX server is still running fine but has lost access to the network. We have two vSwitches - the first has two portgroups (one for Service Console and one for VM traffic, both using VLAN tagging and with two pNICs attached as active/active), and the second has a few portgroups for NFS and iSCSI storage (and also two pNICs). When the host networking fails, we can still ping the storage but lose access to the default gateway on the Service Console etc and all VM networks. I've tried 'service network restart' to no avail, and both ifconfig and 'esxcfg-nics -l' show all interfaces as up. The h/w is an HP BL460 G6 blade (latest generation) with HP switches in the back of the bladechassis. The HP switches also think the respective ports are active, and doing a 'shut/no shut' doesn't get the network access back nor does 'ifconfig vmnic0 up'. The only way we've found to recover the ESX host is to reboot it, at which point the networking works fine again. Until we boot this particular VM that is - it's reproducible at least. Starting the VM in safe mode works OK, so I'm guessing it's a driver issue conflicting with ESX. We've logged a call with VMWare, but the fact that a rogue VM can take out an entire host is very worrying.

I'm going to try updating the VMTools next. /var/log/messages doesn't show anything useful (that I could see).

Anyone seen this before? Got any ideas how to troubleshoot? I

0 Kudos
2 Replies
weinstein5
Immortal
Immortal

Have checked simple things like what are the respective IP addresses of the VM and the Service Console port -

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
edgrigson
Enthusiast
Enthusiast

I have, although they can't clash as they're in different VLANs with no routing between them. Even if they were the same, you wouldn't expect the VM to knock out the Service Console - I'd expect the VM to note the IP conflict as usual and not assign an IP. It's definately network related because if I remove the vNIC from the VM and start it up it's ok, but when I readd a vNIC I'm back to isloating the host. Any other ideas?

0 Kudos