VMware Cloud Community
ianroche80
Contributor
Contributor

Virtual Machines become unresponsive

Hi Guys,

We are having a strange issue with one of our Hosts (Dell M710) running ESXi4 .0.0, 261974 . Every 30 minutes the virtual machines become unresponsive you cant access them via RDP , ping or the console in the VI client, The VMKernel port stays up when this occurs and the access to the host via vi client is ok and you can navigate around it. If I try to pull logs from the host it comes back with a status of "missing" ? Its very strange and we did see it on another host previously which we rebooted straight away and the problem went away.. We are anxious to resolve it and have moved all production vm's off the affected host. Its really odd as I said it occurs around every 30 minutes for a period of about 40 seconds to a minute. I have logged a call with Dell and they have been unable to find anything as of yet. They have checked the switch logs on the chassis just to see if anything strange was going on with the network and it looks ok. The problem occurs with vm's on local and ISCS SAN storage . If anyone has seen this before id be very interested to hear from you.... I am really struggling to pin this one down and it looks like a reboot may be required very soon as we have lost redundancy with this host out of the loop..We have an identical M710 in the cluster with the exact same config connected to the same network swithces running fine.

Cheers

Ian

Tags (2)
0 Kudos
4 Replies
AndreTheGiant
Immortal
Immortal

All hardware firmware are up-to-date?

Do you have OMSA installed, to check if there is some hardware log?

Do you lost also the management interface of the ESXi?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
weinstein5
Immortal
Immortal

What sort of load is running on the host in question? How many VMs? The configuration of the ESX server? Is there any VM that is running a process every 30 minutes on that host?

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
ianroche80
Contributor
Contributor

The host has virtually no load I have two test systems on it running nothing thana bare OS and none of these are running any kind of process. They are pretty much brand new machines all firmware updated. I dont have OMSA installed I may have to reboot and run this from a cd. The vmkernel port is on a vswitch with the machines accessing the production lan. I never lose connectivity to the host when an outage on the vms occur. Its pingable and the vms are not and they are using the same phyiscal nics. It looks to me like the host is locking up to a certain extent...

0 Kudos
ianroche80
Contributor
Contributor

Just to let you know I resolved this, I checked the iscsi vmhba and I seen it had a dead path to a Lun that was set to offline on the iscsi san . I done a tail -f /var/log/messages at the console when one of the outages was occuring and I could see the log getting flooded with messages from the initiatior trying to connect to this Lun which was causing the host to fall over. Once I removed this LUN and refreshed the storage adapter the problem has gone away and the host is functioning correctly. I have tried reproduce this on the host but it didnt occur when I tried to do it again.. Anyways the problem is resolved so im happy out for now.

0 Kudos