VMware Cloud Community
rongill
Contributor
Contributor

esx host went down

I have a cluster of 4 hosts. This morning when I came in one of the hosts said not responding. I could ping it but couldn't ssh or do anything with it. The guest os's did move over to another host but were not pingable. Even after rebooting the guest OS's they weren't. All the hosts have access to the same networks.

Only after I rebooted them and moved them back to the original host they were able to ping?

Any ideas?

These were the last 2 event logs for that host before it went down. Any ideas why the cluster would throw an error?

error 6/25/2009 3:00:00 PM Host 172.31.224.38 in Vancouver is not responding

error 6/25/2009 2:59:00 PM HA agent on 172.31.224.38 in cluster Vanc HA in Vancouver has an error

0 Kudos
9 Replies
Troy_Clavell
Immortal
Immortal

sounds like hostd crashed. Have you checked /var/log/messages

Also, what is your isolation response set to for HA? I would consider setting it to "leave vm's powered on".

0 Kudos
rongill
Contributor
Contributor

Here are the latest logs. I've changed them to stay powered on for isolation response

Jun 26 01:22:47 vancsapp04 kernel: resize_dma_pool: unknown device type 12

Jun 26 01:22:47 vancsapp04 last message repeated 2 times

Jun 26 01:22:47 vancsapp04 kernel: qla2x00_set_info starts at address = d2152060

Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module qla2300_conf

Jun 26 01:22:47 vancsapp04 kernel: sdag : READ CAPACITY failed.

Jun 26 01:22:47 vancsapp04 kernel: sdag : status = 1, message = 00, host = 0, driver = 08

Jun 26 01:22:47 vancsapp04 kernel: Current sd00:00: sense key Not Ready

Jun 26 01:22:47 vancsapp04 kernel: Additional sense indicates Medium not present

Jun 26 01:22:47 vancsapp04 kernel: sdag : block size assumed to be 512 bytes, disk size 1GB.

Jun 26 01:22:47 vancsapp04 kernel: sdag: I/O error: dev 42:00, sector 0

Jun 26 01:22:47 vancsapp04 kernel: I/O error: dev 42:00, sector 0

Jun 26 01:22:47 vancsapp04 kernel: unable to read partition table

Jun 26 01:22:47 vancsapp04 kernel: scsi4 : SCSI emulation for USB Mass Storage devices

Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module qla2300_conf

Jun 26 01:22:47 vancsapp04 kernel: Vendor: HL-DT-ST Model: RW/DVD GCC-4244N Rev: 1.02

Jun 26 01:22:47 vancsapp04 kernel: Type: CD-ROM ANSI SCSI revision: 02

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi CD-ROM sr0 at scsi4, channel 0, id 0, lun 0

Jun 26 01:22:47 vancsapp04 kernel: resize_dma_pool: unknown device type 12

Jun 26 01:22:47 vancsapp04 last message repeated 2 times

Jun 26 01:22:47 vancsapp04 kernel: sr0: scsi-1 drive

Jun 26 01:22:47 vancsapp04 kernel: Uniform CD-ROM driver Revision: 3.12

Jun 26 01:22:47 vancsapp04 kernel: USB Mass Storage support registered.

Jun 26 01:22:47 vancsapp04 kernel: qla2x00_set_info starts at address = d2152060

Jun 26 01:22:47 vancsapp04 kernel: sdag : READ CAPACITY failed.

Jun 26 01:22:47 vancsapp04 kernel: sdag : status = 1, message = 00, host = 0, driver = 08

Jun 26 01:22:47 vancsapp04 kernel: Current sd00:00: sense key Not Ready

Jun 26 01:22:47 vancsapp04 kernel: Additional sense indicates Medium not present

Jun 26 01:22:47 vancsapp04 kernel: sdag : block size assumed to be 512 bytes, disk size 1GB.

Jun 26 01:22:47 vancsapp04 kernel: sdag: I/O error: dev 42:00, sector 0

Jun 26 01:22:47 vancsapp04 kernel: I/O error: dev 42:00, sector 0

Jun 26 01:22:47 vancsapp04 kernel: unable to read partition table

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg1 at scsi1, channel 0, id 0, lun 0, type 8

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg9 at scsi1, channel 0, id 5, lun 0, type 12

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg17 at scsi2, channe

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg9 at scsi1, channel 0, id 5, lun 0, type 12

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg17 at scsi2, channel 0, id 0, lun 0, type 12

Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module block-major-2

Jun 26 01:22:47 vancsapp04 kernel: Attached scsi generic sg28 at scsi2, channel 0, id 3, lun 0, type 12

Jun 26 01:22:47 vancsapp04 modprobe: modprobe: Can't locate module block-major-2

Jun 26 01:22:47 vancsapp04 kernel: scsi_register_dev_mod starting finish

Jun 26 01:22:48 vancsapp04 kernel: scsi_register_dev_mod done with finish

Jun 26 01:22:48 vancsapp04 kernel: resize_dma_pool: unknown device type 12

Jun 26 01:22:48 vancsapp04 last message repeated 2 times

Jun 26 01:22:48 vancsapp04 kernel: sdag : READ CAPACITY failed.

Jun 26 01:22:48 vancsapp04 kernel: sdag : status = 1, message = 00, host = 0, driver = 08

Jun 26 01:22:48 vancsapp04 kernel: Current sd00:00: sense key Not Ready

Jun 26 01:22:48 vancsapp04 kernel: Additional sense indicates Medium not present

Jun 26 01:22:48 vancsapp04 kernel: sdag : block size assumed to be 512 bytes, disk size 1GB.

Jun 26 01:22:48 vancsapp04 kernel: sdag: I/O error: dev 42:00, sector 0

Jun 26 01:22:48 vancsapp04 kernel: I/O error: dev 42:00, sector 0

Jun 26 01:22:48 vancsapp04 kernel: unable to read partition table

Jun 26 01:23:23 vancsapp04 /usr/lib/vmware/hostd/vmware-hostd[1205]: Accepted password for user root from 172.31.224.24

Jun 26 01:23:36 vancsapp04 last message repeated 2 times

Jun 26 01:23:36 vancsapp04 passwd(pam_unix)[1685]: password changed for vpxuser

Jun 26 01:23:39 vancsapp04 /usr/lib/vmware/hostd/vmware-hostd[1205]: Accepted password for user vpxuser from 127.0.0.1

Jun 26 01:25:57 vancsapp04 sshd[1741]: Connection from 172.31.227.2 port 39867

Jun 26 01:26:03 vancsapp04 sshd[1741]: Accepted password for root from 172.31.227.2 port 39867 ssh2

Jun 26 01:26:03 vancsapp04 sshd(pam_unix)[1741]: session opened for user root by (uid=0)

Jun 26 01:26:16 vancsapp04 sshd[1741]: Connection closed by 172.31.227.2

Jun 26 01:26:16 vancsapp04 sshd[1741]: Closing connection to 172.31.227.2

Jun 26 01:26:16 vancsapp04 sshd(pam_unix)[1741]: session closed for user root

Jun 26 01:33:26 vancsapp04 sshd[1912]: Connection from 172.31.227.2 port 39940

Jun 26 01:34:01 vancsapp04 sshd[1925]: Connection from 172.31.227.2 port 39941

Jun 26 01:34:08 vancsapp04 sshd[1925]: Accepted password for root from 172.31.227.2 port 39941 ssh2

Jun 26 01:34:08 vancsapp04 sshd(pam_unix)[1925]: session opened for user root by (uid=0)

Jun 26 02:01:02 vancsapp04 syslogd 1.4.1: restart.

Jun 26 02:01:08 vancsapp04 kernel: loop: loaded (max 8 devices)

Jun 26 02:26:05 vancsapp04 sshd[4153]: Connection from 172.31.227.2 port 40360

0 Kudos
Troy_Clavell
Immortal
Immortal

I don't see anything really weird... check /var/log/vmware/hostd.log ( ls -l will give you are timestamp for the latest log file)

what is your service console memory set too? You may think about increasing that as well.

0 Kudos
rongill
Contributor
Contributor

0 Kudos
Troy_Clavell
Immortal
Immortal

lot's of stuff in the log files, may take awhile to go through it and see if I can find anything.

For your service console memory check through the VI Client, go to the configuration tab of the ESX host, and then click on memory. We typically set ours to 800MB

0 Kudos
rongill
Contributor
Contributor

ya i have cpu set to 583 and memory to 800.

these were values given to me by vranger support group.

0 Kudos
Troy_Clavell
Immortal
Immortal

I see a lot of these type of errors. Are all your ESX Hosts added into the cluster by name and there is no problems with name resolution between your ESX hosts and your license server? I am no expert in trying to figure this out, but you may want to open an SR and see if these errors are related to your crash

2009-06-26 02:26:37.498 'ha-license-manager' 90741680 error Invalid server to change license source to

0 Kudos
rongill
Contributor
Contributor

They are added by IP .. ive added the proper dns entries on each host so resolution shouldn't be a problem. Ya opening a SR is probably the best thing to do.

Thanks

0 Kudos
malaysiavm
Expert
Expert

I got similar problem when I vmotion the VM from host 1 to host 2 during the patches upgrade, the VM network connection is not reachable just for no reason. I try to disconnect from vm setting and enable back then it will back to normal. After a while, I found that I need to recreate a new network adaptor for those VMs which have this problem as it will disconnected from LAN again after a while from time to time.

This only happen on some Virtual machines, so I do not take it to seirous since I had fixed the problem myself.

Craig

vExpert 2009

Malaysia VMware Communities -

Craig vExpert 2009 & 2010 Netapp NCIE, NCDA 8.0.1 Malaysia VMware Communities - http://www.malaysiavm.com
0 Kudos