I have a cluster of 4 hosts. This morning when I came in one of the hosts said not responding. I could ping it but couldn't ssh or do anything with it. The guest os's did move over to another host but were not pingable. Even after rebooting the guest OS's they weren't. All the hosts have access to the same networks.
Only after I rebooted them and moved them back to the original host they were able to ping?
Any ideas?
These were the last 2 event logs for that host before it went down. Any ideas why the cluster would throw an error?
error 6/25/2009 3:00:00 PM Host 172.31.224.38 in Vancouver is not responding
error 6/25/2009 2:59:00 PM HA agent on 172.31.224.38 in cluster Vanc HA in Vancouver has an error
sounds like hostd crashed. Have you checked /var/log/messages
Also, what is your isolation response set to for HA? I would consider setting it to "leave vm's powered on".
Here are the latest logs. I've changed them to stay powered on for isolation response
Jun 26 01:22:47 vancsapp04 kernel: resize_dma_pool: unknown device type 12
Jun 26 01:22:47 vancsapp04 last message repeated 2 times
Jun 26 01:22:47 vancsapp04 kernel: qla2x00_set_info starts at address = d2152060
Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module qla2300_conf
Jun 26 01:22:47 vancsapp04 kernel: sdag : READ CAPACITY failed.
Jun 26 01:22:47 vancsapp04 kernel: sdag : status = 1, message = 00, host = 0, driver = 08
Jun 26 01:22:47 vancsapp04 kernel: Current sd00:00: sense key Not Ready
Jun 26 01:22:47 vancsapp04 kernel: Additional sense indicates Medium not present
Jun 26 01:22:47 vancsapp04 kernel: sdag : block size assumed to be 512 bytes, disk size 1GB.
Jun 26 01:22:47 vancsapp04 kernel: sdag: I/O error: dev 42:00, sector 0
Jun 26 01:22:47 vancsapp04 kernel: I/O error: dev 42:00, sector 0
Jun 26 01:22:47 vancsapp04 kernel: unable to read partition table
Jun 26 01:22:47 vancsapp04 kernel: scsi4 : SCSI emulation for USB Mass Storage devices
Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module qla2300_conf
Jun 26 01:22:47 vancsapp04 kernel: Vendor: HL-DT-ST Model: RW/DVD GCC-4244N Rev: 1.02
Jun 26 01:22:47 vancsapp04 kernel: Type: CD-ROM ANSI SCSI revision: 02
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi CD-ROM sr0 at scsi4, channel 0, id 0, lun 0
Jun 26 01:22:47 vancsapp04 kernel: resize_dma_pool: unknown device type 12
Jun 26 01:22:47 vancsapp04 last message repeated 2 times
Jun 26 01:22:47 vancsapp04 kernel: sr0: scsi-1 drive
Jun 26 01:22:47 vancsapp04 kernel: Uniform CD-ROM driver Revision: 3.12
Jun 26 01:22:47 vancsapp04 kernel: USB Mass Storage support registered.
Jun 26 01:22:47 vancsapp04 kernel: qla2x00_set_info starts at address = d2152060
Jun 26 01:22:47 vancsapp04 kernel: sdag : READ CAPACITY failed.
Jun 26 01:22:47 vancsapp04 kernel: sdag : status = 1, message = 00, host = 0, driver = 08
Jun 26 01:22:47 vancsapp04 kernel: Current sd00:00: sense key Not Ready
Jun 26 01:22:47 vancsapp04 kernel: Additional sense indicates Medium not present
Jun 26 01:22:47 vancsapp04 kernel: sdag : block size assumed to be 512 bytes, disk size 1GB.
Jun 26 01:22:47 vancsapp04 kernel: sdag: I/O error: dev 42:00, sector 0
Jun 26 01:22:47 vancsapp04 kernel: I/O error: dev 42:00, sector 0
Jun 26 01:22:47 vancsapp04 kernel: unable to read partition table
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0, type 8
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg9 at scsi1, channel 0, id 5, lun 0, type 12
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg17 at scsi2, channe
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg9 at scsi1, channel 0, id 5, lun 0, type 12
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg17 at scsi2, channel 0, id 0, lun 0, type 12
Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module block-major-2
Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg28 at scsi2, channel 0, id 3, lun 0, type 12
Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module block-major-2
Jun 26 01:22:47 vancsapp04 kernel: scsi_register_dev_mod starting finish
Jun 26 01:22:48 vancsapp04 kernel: scsi_register_dev_mod done with finish
Jun 26 01:22:48 vancsapp04 kernel: resize_dma_pool: unknown device type 12
Jun 26 01:22:48 vancsapp04 last message repeated 2 times
Jun 26 01:22:48 vancsapp04 kernel: sdag : READ CAPACITY failed.
Jun 26 01:22:48 vancsapp04 kernel: sdag : status = 1, message = 00, host = 0, driver = 08
Jun 26 01:22:48 vancsapp04 kernel: Current sd00:00: sense key Not Ready
Jun 26 01:22:48 vancsapp04 kernel: Additional sense indicates Medium not present
Jun 26 01:22:48 vancsapp04 kernel: sdag : block size assumed to be 512 bytes, disk size 1GB.
Jun 26 01:22:48 vancsapp04 kernel: sdag: I/O error: dev 42:00, sector 0
Jun 26 01:22:48 vancsapp04 kernel: I/O error: dev 42:00, sector 0
Jun 26 01:22:48 vancsapp04 kernel: unable to read partition table
Jun 26 01:23:23 vancsapp04 /usr/lib/vmware/hostd/vmware-hostd[1205]: Accepted password for user root from 172.31.224.24
Jun 26 01:23:36 vancsapp04 last message repeated 2 times
Jun 26 01:23:36 vancsapp04 passwd(pam_unix)[1685]: password changed for vpxuser
Jun 26 01:23:39 vancsapp04 /usr/lib/vmware/hostd/vmware-hostd[1205]: Accepted password for user vpxuser from 127.0.0.1
Jun 26 01:25:57 vancsapp04 sshd[1741]: Connection from 172.31.227.2 port 39867
Jun 26 01:26:03 vancsapp04 sshd[1741]: Accepted password for root from 172.31.227.2 port 39867 ssh2
Jun 26 01:26:03 vancsapp04 sshd(pam_unix)[1741]: session opened for user root by (uid=0)
Jun 26 01:26:16 vancsapp04 sshd[1741]: Connection closed by 172.31.227.2
Jun 26 01:26:16 vancsapp04 sshd[1741]: Closing connection to 172.31.227.2
Jun 26 01:26:16 vancsapp04 sshd(pam_unix)[1741]: session closed for user root
Jun 26 01:33:26 vancsapp04 sshd[1912]: Connection from 172.31.227.2 port 39940
Jun 26 01:34:01 vancsapp04 sshd[1925]: Connection from 172.31.227.2 port 39941
Jun 26 01:34:08 vancsapp04 sshd[1925]: Accepted password for root from 172.31.227.2 port 39941 ssh2
Jun 26 01:34:08 vancsapp04 sshd(pam_unix)[1925]: session opened for user root by (uid=0)
Jun 26 02:01:02 vancsapp04 syslogd 1.4.1: restart.
Jun 26 02:01:08 vancsapp04 kernel: loop: loaded (max 8 devices)
Jun 26 02:26:05 vancsapp04 sshd[4153]: Connection from 172.31.227.2 port 40360
I don't see anything really weird... check /var/log/vmware/hostd.log ( ls -l will give you are timestamp for the latest log file)
what is your service console memory set too? You may think about increasing that as well.
lot's of stuff in the log files, may take awhile to go through it and see if I can find anything.
For your service console memory check through the VI Client, go to the configuration tab of the ESX host, and then click on memory. We typically set ours to 800MB
ya i have cpu set to 583 and memory to 800.
these were values given to me by vranger support group.
I see a lot of these type of errors. Are all your ESX Hosts added into the cluster by name and there is no problems with name resolution between your ESX hosts and your license server? I am no expert in trying to figure this out, but you may want to open an SR and see if these errors are related to your crash
2009-06-26 02:26:37.498 'ha-license-manager' 90741680 error Invalid server to change license source to
They are added by IP .. ive added the proper dns entries on each host so resolution shouldn't be a problem. Ya opening a SR is probably the best thing to do.
Thanks
I got similar problem when I vmotion the VM from host 1 to host 2 during the patches upgrade, the VM network connection is not reachable just for no reason. I try to disconnect from vm setting and enable back then it will back to normal. After a while, I found that I need to recreate a new network adaptor for those VMs which have this problem as it will disconnected from LAN again after a while from time to time.
This only happen on some Virtual machines, so I do not take it to seirous since I had fixed the problem myself.
Craig
vExpert 2009