tman24
Enthusiast
Enthusiast

Server running, but VI client can't connect.

I've got a bit of an awkward problem with one of our production ESXi 3.5 (build 130755) servers. Basically, the VM's seem to be running fine, but I cannot connect to the host using VI Client (says 'connection problem'). I've re-started the management network from the console, and it tests ok. There are no network problems I can tell, but the host itself won't allow connections. I use Veeam monitor as well, and this also cannot connect. SSH is responding ok, and I can login fine, but the web interface is not responding. There are no rogue processes running, and esxtop shows no abnormalities.

Is there any way I can try and resolve this without rebooting? I did notice that after pressing F2 to logon at the console and entering the credentials, it's very slow to get to the main menu, but once there it seems ok. I've checked what logfiles I can, and they seem clean.

Server is a Dell PowerEdge 2950 III

0 Kudos
18 Replies
AntonVZhbankov
Immortal
Immortal

Are trying to connect from the same subnet? Is there firewall between you and ESXi?


---

VMware vExpert '2009

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru
0 Kudos
tman24
Enthusiast
Enthusiast

Yes, everything's on the same subnet and no firewalls or VLAN's. It was all working fine until last weekend. It was unresponsive on Monday morning.

0 Kudos
AntonVZhbankov
Immortal
Immortal

Nothings happens on ESXi reboot?


---

VMware vExpert '2009

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru
0 Kudos
tman24
Enthusiast
Enthusiast

Assuming you mean does a reboot fix the problem, then I'd rather not have to try that. As I said, it's a production server running production VM's. If I can try and resolve the problem without rebooting, that would be my primary goal. If I really can't fix it, then I will have to schedule in a reboot, but as the VM's are running ok, it makes it more difficult.

0 Kudos
athlon_crazy
Virtuoso
Virtuoso

Did you check whether all services are running?

Try $netstat -tulpn and see whether port 902, 443, 8009 & etc are listening for connection..

System Engineer

Zen Systems Sdn Bhd

Malaysia

www.no-x.org

http://www.no-x.org
0 Kudos
AntonVZhbankov
Immortal
Immortal

tman24, if it's free ESXi or you just don't have VMotion license, then I agree, it's a real problem.

Check if web-server still running on ESXi. Depending on your netwrok configuration, you can try to bind management interface to another physical port.


---

VMware vExpert '2009

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru
0 Kudos
avlieshout
VMware Employee
VMware Employee

Test if you can telnet to port 902.

Maybe you have insufficient resources in your COS.

run "top" (not esxtop!) and check on the top of the screen if there are zombie processes running.

Do a "ps -auxww | more" and check if there are processes running that shouldn't be there

Is there any COS partition filling up?

check with "df -h"

-Arnim van Lieshout

-


Blogging:

Twitter:

If you find this information useful, please award points for "correct" or "helpful".

Arnim van Lieshout Blogging: http://www.van-lieshout.com Twitter: http://www.twitter.com/avlieshout If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
tman24
Enthusiast
Enthusiast

Thanks. It is ESXi, so I have limited console support. The web interface is also not responding on the host, so looks like 443 at least is down (or locked out). Re-binding the management network to another NIC would normally be done through the VI Client, which I can't access on this host, so catch 22! Looks like I'm outa-luck.

I'll have a quick check in VIMA to see if this remote console has any joy.

0 Kudos
AntonVZhbankov
Immortal
Immortal

tman24, you can reconfigure what interface is used for management via server local console, iLO in case of HP server.

Also you can do some diagnostic stuff from local console.

Press Alt-F2, then type "unsupported". You'll get local login prompt.


---

VMware vExpert '2009

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru
0 Kudos
virtualprince
Enthusiast
Enthusiast

Try restarting the hostd services...it might solve the issue...ur vm`s will not be affected by this...

Thanks, Suresh
0 Kudos
tman24
Enthusiast
Enthusiast

To summarize the last couple of replies, there is no 'top' command on ESXi, just 'esxtop', and that show's no rogue processes.

Telnet to 902 - responds ok

Telnet to 443 - no response

Telnet to 8009 - no response

I guess at points like this you see how much has been pulled from ESXi to get the footprint down!

0 Kudos
tman24
Enthusiast
Enthusiast

If I kill the hostd service, will it auto re-start? What is the best way of making sure this happens.

0 Kudos
virtualprince
Enthusiast
Enthusiast

Here is the step by step ...

  1. ps -auxwww | grep hostd (to find the pid's of the hostd services)

  1. kill -9 XXXX XXXX (where XXXX is the PID`s)

  1. service mgmt-vmware start

  1. service vmware-vpxa restart

Hope this helps....

Thanks,

Suresh...

if you found my answer to be useful, feel free to mark it as Helpful or Correct.

Thanks, Suresh
0 Kudos
tman24
Enthusiast
Enthusiast

Thanks for the suggestions.

Firstly, ps -auxwww is not fully supported on ESXi, only ps -u seems to work, and it shows I have 12 hostd processes running

kill -9 should work, but do I have to kill ALL the hostd services?

'service' is not supported on ESXi, but there is a /usr/sbin/services.sh script that seems to offer some service control. Neither mgmt-vmware or vmware-vpxa appear as services on ESXi.

Looks more and more likely I'm going to have to schedule a reboot.

0 Kudos
AntonVZhbankov
Immortal
Immortal

tman24, that's why I have VI Enterprise with VMotion. I can just live migrate all VMs, reboot ESX, migrate VMs back and no one will notice. Smiley Happy


---

VMware vExpert '2009

http://blog.vadmin.ru

EMCCAe, MCITP: SA+VA, VCP 3/4/5, VMware vExpert http://blog.vadmin.ru
0 Kudos
cmacmillan
Hot Shot
Hot Shot

If it's ESXi and you're using SSH then you've modified /etc/inetd.conf. Assuming you edited /etc/inetd.conf and killed inetd (kill -HUP) to enable busybox, could you have killed the wrong process or created an error in /etc/inetd.conf? Output of "ps a" might help...

--Collin C. MacMillan

SOLORI - Solution Oriented, LLC

Collin C. MacMillan, VCP4/VCP5 VCAP-DCD4 Cisco CCNA/CCNP, Nexenta CNE VMware vExpert 2010-2012 SOLORI - Solution Oriented, LLC http://blog.solori.net If you find this information useful, please award points for "correct" or "helpful".
0 Kudos
tman24
Enthusiast
Enthusiast

Yes, I've edited inetd.conf to enable SSH on all my ESXi servers, and it's worked without fault. This server has been online for almost 3 months now though, and I haven't edited any local file since it went live, so I do not suspect any editing of files has caused any problems.

0 Kudos
PabloOttawa
Enthusiast
Enthusiast

Tman,

This could look really stupid and elemental, but it is worth checking: We had the same problem with some ESX hosts in the past, SSH was fine, processes on the server were fine but vCenter couldn't connect.

Turns out that vCenter network connection was configured to use 100Mbps full duplex while the switch port was configured to use 100Mbps half duplex (it is a network being upgraded). So I would make sure that all network connections from vCenter to the switch and ESX to the switch share the same configurations. If possible, avoid using autonegotiate and hardcode them.

Jus my 2 cents... that was our problem at that time...

Pablo

0 Kudos