VMware Cloud Community
mwkirk
Contributor
Contributor

Vsphere 4.1 U1 on DL380G7

I brought in a new host and installed ESXi 4.1 U1 that was downloaded from the HP site.  Everything went fine with the install and I got the box up, added to the HA cluster, moved some VMs to it and life was good.  However, the time varies but anywhere from 30 minutes to maybe a couple of hours after the Server is up the VMs basically stop responding.   I can go into VCenter and it show the VMs there and shows they are running, however when I try to power them down or make any changes it basically times out.  Also, if I try to connect directly to the host with the Vmware Client it will timeout on that.   The box has ILO so I can get to the console and I can login...usually I can navigate through the menu and for example I went to enable Tech support mode for Vmware Tech support and when I select the option the console basically hangs.  Also, once I got Vmware on the console normally they could run one command and then the prompt would hang.  I have upgrade the Bios and run a memory check which checks out fine.  I am about to try and get a Syslog server going so I can see if there are any events coming up that woul be useful.  The host will eventually get to the point that it shows disconnected in VCenter and even if I try to do a reboot from the console it will just hang.  As, a last resort I have to do a reset through ILO and bring the box back up to start the process all over again.

Anyone ever run into anything similar??  I will post more information as I can get it but wanted to throw the general issue out there in case anyone has ever seen anything like this before.

0 Kudos
4 Replies
Dave_Mishchenko
Immortal
Immortal

Is the host booting from disk or a flash device?  If it's disk then the syslog local path should be set and you should be able to get log file from the host (even after it is reset).  If it's not then set the path (Configuration > Software >Advanced Settings - look for Syslog.Local.DatastorePath.  Changing that doesn't require a reboot.

I would also open a SSH or local console session while the host is still running given the frequency that the problem occurs.  If your SSH session stays active you can run vm-support -w <datastorepath> to generate a support bundle.  That should assist VMware support in their investigations.

Dave
VMware Communities User Moderator

Free ESXi Essentials training / eBook offer

Now available - VMware ESXi: Planning, Implementation, and Security

Also available - vSphere Quick Start Guide

0 Kudos
mwkirk
Contributor
Contributor

Thanks for that information as that will help.  I would have thought VMware support would have thought to do that since the support rep must have said 5 times that the logs get erased on reboot so there will be nothing to look at.

0 Kudos
Sreejesh_D
Virtuoso
Virtuoso

can you check /var/log/messages for APD(All Path Down) errors. the error will be similar to the following. The host will enter into hung state if there is an APD.

APR 15 15:38:15 vmkernel: 0:00:53:27.619 cpu12:4386)WARNING: NMP: nmp_DeviceAttemptFailover: Retry world failover device "naa.600601604e702900ec264057285fe011" - failed to issue command due to Not found (APD), try again...

If so please check your storage inrfrsturcture (FC cabling, HBA, switch, storage ) to know the reason for path failure. The following KB will give your more insight on APD issues.

http://kb.vmware.com/kb/1016626

0 Kudos
hank-ger
Enthusiast
Enthusiast

hi,

I have also an HP DL380G7 installed with ESXi 4.1U1 and everything works fine

Have you used the HP-ESXi install source?

0 Kudos