I've got a bit of an awkward problem with one of our production ESXi 3.5 (build 130755) servers. Basically, the VM's seem to be running fine, but I cannot connect to the host using VI Client (says 'connection problem'). I've re-started the management network from the console, and it tests ok. There are no network problems I can tell, but the host itself won't allow connections. I use Veeam monitor as well, and this also cannot connect. SSH is responding ok, and I can login fine, but the web interface is not responding. There are no rogue processes running, and esxtop shows no abnormalities.
Is there any way I can try and resolve this without rebooting? I did notice that after pressing F2 to logon at the console and entering the credentials, it's very slow to get to the main menu, but once there it seems ok. I've checked what logfiles I can, and they seem clean.
Server is a Dell PowerEdge 2950 III
Are trying to connect from the same subnet? Is there firewall between you and ESXi?
---
VMware vExpert '2009
Yes, everything's on the same subnet and no firewalls or VLAN's. It was all working fine until last weekend. It was unresponsive on Monday morning.
Nothings happens on ESXi reboot?
---
VMware vExpert '2009
Assuming you mean does a reboot fix the problem, then I'd rather not have to try that. As I said, it's a production server running production VM's. If I can try and resolve the problem without rebooting, that would be my primary goal. If I really can't fix it, then I will have to schedule in a reboot, but as the VM's are running ok, it makes it more difficult.
Did you check whether all services are running?
Try $netstat -tulpn and see whether port 902, 443, 8009 & etc are listening for connection..
System Engineer
Zen Systems Sdn Bhd
Malaysia
tman24, if it's free ESXi or you just don't have VMotion license, then I agree, it's a real problem.
Check if web-server still running on ESXi. Depending on your netwrok configuration, you can try to bind management interface to another physical port.
---
VMware vExpert '2009
Test if you can telnet to port 902.
Maybe you have insufficient resources in your COS.
run "top" (not esxtop!) and check on the top of the screen if there are zombie processes running.
Do a "ps -auxww | more" and check if there are processes running that shouldn't be there
Is there any COS partition filling up?
check with "df -h"
-Arnim van Lieshout
-
If you find this information useful, please award points for "correct" or "helpful".
Thanks. It is ESXi, so I have limited console support. The web interface is also not responding on the host, so looks like 443 at least is down (or locked out). Re-binding the management network to another NIC would normally be done through the VI Client, which I can't access on this host, so catch 22! Looks like I'm outa-luck.
I'll have a quick check in VIMA to see if this remote console has any joy.
tman24, you can reconfigure what interface is used for management via server local console, iLO in case of HP server.
Also you can do some diagnostic stuff from local console.
Press Alt-F2, then type "unsupported". You'll get local login prompt.
---
VMware vExpert '2009
Try restarting the hostd services...it might solve the issue...ur vm`s will not be affected by this...
To summarize the last couple of replies, there is no 'top' command on ESXi, just 'esxtop', and that show's no rogue processes.
Telnet to 902 - responds ok
Telnet to 443 - no response
Telnet to 8009 - no response
I guess at points like this you see how much has been pulled from ESXi to get the footprint down!
If I kill the hostd service, will it auto re-start? What is the best way of making sure this happens.
Here is the step by step ...
ps -auxwww | grep hostd (to find the pid's of the hostd services)
kill -9 XXXX XXXX (where XXXX is the PID`s)
service mgmt-vmware start
service vmware-vpxa restart
Hope this helps....
Thanks,
Suresh...
if you found my answer to be useful, feel free to mark it as Helpful or Correct.
Thanks for the suggestions.
Firstly, ps -auxwww is not fully supported on ESXi, only ps -u seems to work, and it shows I have 12 hostd processes running
kill -9 should work, but do I have to kill ALL the hostd services?
'service' is not supported on ESXi, but there is a /usr/sbin/services.sh script that seems to offer some service control. Neither mgmt-vmware or vmware-vpxa appear as services on ESXi.
Looks more and more likely I'm going to have to schedule a reboot.
tman24, that's why I have VI Enterprise with VMotion. I can just live migrate all VMs, reboot ESX, migrate VMs back and no one will notice.
---
VMware vExpert '2009
If it's ESXi and you're using SSH then you've modified /etc/inetd.conf. Assuming you edited /etc/inetd.conf and killed inetd (kill -HUP) to enable busybox, could you have killed the wrong process or created an error in /etc/inetd.conf? Output of "ps a" might help...
--Collin C. MacMillan
SOLORI - Solution Oriented, LLC
Yes, I've edited inetd.conf to enable SSH on all my ESXi servers, and it's worked without fault. This server has been online for almost 3 months now though, and I haven't edited any local file since it went live, so I do not suspect any editing of files has caused any problems.
Tman,
This could look really stupid and elemental, but it is worth checking: We had the same problem with some ESX hosts in the past, SSH was fine, processes on the server were fine but vCenter couldn't connect.
Turns out that vCenter network connection was configured to use 100Mbps full duplex while the switch port was configured to use 100Mbps half duplex (it is a network being upgraded). So I would make sure that all network connections from vCenter to the switch and ESX to the switch share the same configurations. If possible, avoid using autonegotiate and hardcode them.
Jus my 2 cents... that was our problem at that time...
Pablo