I have a host showing disconnected in VC. I cannot SSH into it and of course my remote console isn't working either. In VC, it shows that there are guests running on it and I can get to those guests. I don't believe that they are actually running on that host. Is there a way, from the guest, to tell which host it is actually running on or can I believe it?
I left out an important detail; all the guests that show a part of this host show as disconnected too. When I look at the summary of one of those guests, it says they are on the disconnected host.
I'd say they are on that host if they are showing as disconnected. Anytime I've experienced this issue, I go to the console on my ESX Server, login, and Restart the Management Agents. This should bring the server back to connected in VC, and allow you to export the logs.
can you ping a guest or the host in question? Even though the guest(s) shows up as disconnected, you should still be able to see the summary tab. If you can't ssh into the host in question, then you may have to reboot it.
....but, if the guests are alive, try restarting your VCMS service, that may refresh the guests.
I would try to ping the host or the VM's. Can you connect the VIC directly to that host? If you can afford the down time, I would migrate the VM's to another host or shut them down then SSH into the host in question and restart the management agent with
service vmware-vpxa restart
Hope this helps!
I definately wouldnt reboot this host if your VM's are accessible. If your ESX Management IP is pingable, it means the hostd process has crashed. You can just login to the host and restart these. Don't take downtime to your VM's, simply restart the Management Services on the ESX Host. After you do this, you can send the log to VMWare and they'll find out why hostd crashed in the first place.
I got into the console finally:
16:03:33:18.894 cpu0:1024)VMNIX: <0>scsi: device set offline - command error recovery failed: host 1 channel 0 id 0 lun 0
16:03:33:19.139 cpu0:1024)VMNIX: <0>journal commit I/O error
The ESX spash page is still there though. I attached a screenshot for fun.
This is a SCSI device not ready / offline class error indicating an issue with a SCSI drive. The drive can be local to the computer or remote connected by block mode protocol (SCSI3, FCP, etc). If you are using a SAN attached Array, confirm that the fibre channel interconnects are ok with no sharp bends and firmly seated in the SFP's then check the firmware of your HBA, firmware of the SAN switch and Firmware of the storage Array to confirm you are current. If its a local drive make sure its not getting too hot.
You should never see this error.