Hi admins,
I have alot of problems with one of my HP DL385 G7 servers.
This host will randomly disconnect itself from my vCenter. (vCenter 4.1 u1)
When this happens i try to restart the managment agents from DCUI or by /sbin/services.sh restart.
But nothing happens!!??
I try /sbin/reboot but nothing still happens...
I have updated all the firmware on this server according HP recommendations.
I have reinstalled the server 3 times with different media just to be sure...
The only way to solve the problem is to powercycle the server throw "push the button" and when i do this i will ofcourse have to shutdown my VMs
Input from you before i raise a SR?
Regards
Tyler
there are many possibilities but since you have access to create SR i would advise doing so as from the logs the identification of the root cause would be fast and precise. I'm tilting towards hardware related issue ..
Hi Tyler,
First thing, is your Management IP pingable? (during unresponsive time)
1. check for vpxd service status on ESX host,
2. check for errors in /var/log/messages
3. in vcenter look for any errors in Tasks and Events and also in logs.
Regards,
Balu.
Hi,
Thanks for your replys,
Like idle-jam says it feel like a hardware problem... i agree but on this special servers i have changed the systemboard due to this problem which didnt help. (and i have 3 more servers with the same hardware spec that dont have this strange problem.)
Maybe i have to replace ram and CPU as well... (All HP test says that everything is top notch)
HP DL 385 Proliant G7
AMD Opt 6174
144 GB ram
@krishna
The server is pingable all the time.
Can access throw SSH .
How can i check the vpxd service on the server when this problem occurs? (I cant use the vi client and the server is disconnected in vCenter)
Cant see anything special in the logs ( but im not such a good log interpreter i think )
In vCenter i just see that the host is disconnected...
Regards
Tyler
Update:
When i restart the managment agent from DCUI it seems to hang on "starting USB arbitrator" ....
Like before a powercycle will solve the problem...
Hi,
you can check the vpxd service using local or remote tech support mode. service vpxd status
Moreover, also check the power options in BIOS, it looks like server is not able to return from sleep mode.
Regards,
Balu.
Hi Balu,
See below.
Any ideas?
~ # chkconfig --list
DCUI on
TSM-SSH off
TSM off
usbarbitrator on
lbtd on
storageRM on
sensord on
vprobed on
vobd on
wsman on
slpd on
sfcbd-watchdog on
sfcbd off
ntpd on
hostd on
netlogond off
lwiod off
lsassd off
iked off
vmware-vpxa on
~ # service vpxd status
-ash: service: not found
Hi,
Can you confirm, is vcenter agent(vmware-vpxa) is in running state when its status changed to disconnect in vCenter server?
Try to remove Power saving settings in BIOS. This will help us in isolating the problem.
Regards,
Balu.
Hi,
Found another who has the same problem like me..
This exactly describes my problem
--------------------------------------------------------------------------
http://www.experts-exchange.com/Software/VMWare/Q_26366554.html
My new ESXi 4.1 server is having major connectivity problems. It will run fine for a day or so, then suddenly disconnect itself from my vCenter server. All attempts reconnect fail. Restarting the management agents works, but does nothing to actually bring the connectivity back up.
I can ping the machine. I have changed the IP to a different one, and I still couldn't connect. I have tried removing it from vCenter but I cannot even add it back in. When I try to add it back in I get the following error:
Call "Datacenter.QueryConnectionInfo" for object "Test ESXi" on vCenter Server "Test" failed.
I have tried to connect via DNS name, IP, and FQDN DNS name. I can ping the DNS name and it works fine. All firewalls between the machines are down. Like I said in the begining it will work for about a day or so, then disconnect. It will refuse to connect until rebooted. So the system works, but something on the server is crashing. The tests of the management network also works fine.
So really I have 2 parts to this question. What is failing and how to fix it? How to recover without rebooting my host. I imagine I could restart a service or something similar, but I am unfamiliar with the ESXi commands :(.
If you need further logs please just ask. Also, I am running vCenter 4.1 U1, on Server 2008 R2. So there should be no incompatibility between them. Also, I cannot connect to the host via HTTP or HTTPS. It just times out. SSH works though
-----------------------------------------------------------------------------------------------
Any advice??
BR
Tyler
Looks like you are having storage issues, this will cause host to disconnect.
Also note that what you see in DCUI need not necessarily mean that it is hung at that particular process, in my experience its mostly the hostd(management agent) that can indefinitely if there are storage related issues.
If issue still persists, post the vmkernel log here