VMware Cloud Community
pascaljr
Contributor
Contributor
Jump to solution

VMWare ESXi stopped responding

Hi

I have a home server with a Q6600 Quad Core and 8GB of RAM running with VMWare ESXi 3.5 for about 8 months now. I have 2 datastores, with 1TB (SATA HDs) each, one with 150GB free and the other with 240GB free. I have 9 VMs running 24x7 on it. Everything was doing great, until yesterday.

Out of the blues, I stopped getting responses from the VMs in the ESXi. At first, I could connect using the Infrastructure Client but, when if I tried to get information from any VM, I would get a message that the VM could not be reached. Looking at the Host info, it would show me networking info, cpu, memory, but when I tried to reach the datastore, it would normally stop responding. I was only able to open the datastore located where ESXi is installed once, and all the VMs were there. Now, I can't connect to it anymore, and I really don't know what to do.

What is the best course of action to diagnose the problem? I can access the ESXi screen without a problem, but I don't know what to do. I was thinking of reinstalling it, maybe with version 4.0, but I'm not sure I should do that. Where (and how) can I access anything that could help me figure out what's wrong?

Tks

Tags (2)
0 Kudos
1 Solution

Accepted Solutions
AWo
Immortal
Immortal
Jump to solution

You may try these options in this order:

1. Reset Management Agents: http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_restart_mgmt_net.htm...

2. Restart Management Network: http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_restart_mgmt_net2.ht...

3. Restore the standard vSwitch: http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_restore_stnd_switch....

4. Complete reset: You loose all configurations you did on the ESXi server, but the guests are kept.

http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_reset_the_configurat...


AWo

VCP 3 & 4

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =

View solution in original post

0 Kudos
8 Replies
AWo
Immortal
Immortal
Jump to solution

Can you ping guest machines or do they lost their connectivity, as well?

Can you ping between any system connected to the switch (or whatever) where the ESXi host is connected to?


AWo

VCP 3 & 4

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
pascaljr
Contributor
Contributor
Jump to solution

The last thing I was able to do when I was able to login through the VI Client was enter the host into Maintenance mode, so the guests are not booting anymore. And since I can't connect to the VI Client anymore, I can't really tell.

0 Kudos
AWo
Immortal
Immortal
Jump to solution

You may try these options in this order:

1. Reset Management Agents: http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_restart_mgmt_net.htm...

2. Restart Management Network: http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_restart_mgmt_net2.ht...

3. Restore the standard vSwitch: http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_restore_stnd_switch....

4. Complete reset: You loose all configurations you did on the ESXi server, but the guests are kept.

http://pubs.vmware.com/vsp40u1_i/wwhelp/wwhimpl/js/html/wwhelp.htm#href=setup/t_reset_the_configurat...


AWo

VCP 3 & 4

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
0 Kudos
pascaljr
Contributor
Contributor
Jump to solution

Thanks so, so much for the links.

I'm gonna try these out tonight, after work, when I get home, and let you know.

0 Kudos
pascaljr
Contributor
Contributor
Jump to solution

Just in case none of this works, what would be the next step? Keep in mind this is just a home server, and the downtime is not a factor here, ok?

Since I can access the console, on the ESXi, how can I check if the datastores are intact? And if they are, can I copy them over to an external USB drive?

Thanks so much for your time.

0 Kudos
AWo
Immortal
Immortal
Jump to solution

I would focus on the networking here, as you can't connect to the ESXi host with the vClient. DO you have a cross-over cable? Replace your switch with that cable ond connect the host to another physical system via that cable and check if you can ping the ESXi host or connect to it via the vClient.


AWo

VCP 3 & 4

\[:o]===\[o:]

=Would you like to have this posting as a ringtone on your cell phone?=

=Send "Posting" to 911 for only $999999,99!=

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =
pascaljr
Contributor
Contributor
Jump to solution

Good suggestion! I'll try that as well tonight, and post the results back here.

Thank you! Thank you! Thank you!

0 Kudos
pascaljr
Contributor
Contributor
Jump to solution

Hi,

I reset the setting back to default, and I was able to connect with the VI Client. I reattached one of my VMs, and started booting it up, but I had problems again; the VM tried to boot and ended up locking up, and the VI client became unresponsive, and I couldn't connect to it again. I went into unsupported mode, and checked the /var/log/message, and I found a bunch of Errors reading. Below is a sample:

Aug 31 02:59:36 vmkernel: 0:00:28:41.882 cpu0:2179)StorageMonitor: 196: vmhba33:0:0:0 status =2/0 0xb 0x0 0x0

Aug 31 02:59:37 vmkernel: 0:00:28:42:357 cpu0:5279)<3>ata4: transageld ATA stat/err 0x71/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Aug 31 02:59:37 vmkernel: <4>ata4: status=0x71 { DriveReady DeviceFault SeekComplete Error 0:00:28:42.357 cpu0: 5279)}

last message repeated 1 times

I also got some DriveStatusError on some lines of the same file. Now, looking at the /var/log/vmware/hostd-0.log, I'm getting some errors after successfully opening the vmdk files of the first VM that I reattached :

GetPropertyProvider failed for haTask-ha-folder-vm-vim.Folder.registerVm-45

GetPropertyProvider failed for haTask-16-vim.VirtualMachine.powerOn-49

I get several other GetPropertyProvider errors after that, then some timeouts... It seems clear I have a HD problem. What can I do to save my VMs? Can I do a scancheck on the HDs? If yes, how? Thanks! End of Update

0 Kudos