Can't connect to ESX host from VC or VIC

SRuff · ‎08-24-2007

I have 4 node ESX 3.0.1 cluster, running 2.0.2 VC and VIC and 1 of my nodes suddenly lost connectivity with VC yesterday. I tried connecting through VIC and it was not successful either. I have restarted every service I can think of including

vmware-vpxa

mgmt-vmware

xinetd

vmware-vmkauthd

and still no success. I sat on the phone with VMware support for over 2 hours while they tried everything they could think of, still with no success. Basically here's what is in the logs which seems to be the issue. They told me I needed to reboot the server, but I have 31 Production VM's running (and they are running just fine), so I'd like to use the reboot as a last resort, or at least get the VM's migrated before I do so, but I can't do anything right now.

Error in the /var/log/vmware/vpx/vpxa.log

\[2007-08-24 15:49:06.407 'App' 6839216 error] \[VpxVmdbCnx] Authd error: 514 Error connecting to hostd-vmdb service instance.

\[2007-08-24 15:49:06.407 'App' 6839216 error] \[VpxVmdbCnx] Failed to connect to host :902. Check that authd is running correctly (lib/connect error 11)

\[2007-08-24 15:49:26.427 'App' 3034032 error] \[VpxVmdbCnx] Authd error: 514 Error connecting to hostd-vmdb service instance.

\[2007-08-24 15:49:26.427 'App' 3034032 error] \[VpxVmdbCnx] Failed to connect to host :902. Check that authd is running correctly (lib/connect error 11)

\[2007-08-24 15:49:46.448 'App' 13265840 error] \[VpxVmdbCnx] Authd error: 514 Error connecting to hostd-vmdb service instance.

\[2007-08-24 15:49:46.448 'App' 13265840 error] \[VpxVmdbCnx] Failed to connect to host :902. Check that authd is running correctly (lib/connect error 11)

\[2007-08-24 15:50:06.468 'App' 6839216 error] \[VpxVmdbCnx] Authd error: 514 Error connecting to hostd-vmdb service instance.

\[2007-08-24 15:50:06.469 'App' 6839216 error] \[VpxVmdbCnx] Failed to connect to host :902. Check that authd is running correctly (lib/connect error 11)

esiebert7625 · ‎08-24-2007

Check what services are running. Is your hostd service running?

service --status-all |grep running

Also try:

ps -ef | grep vmware-hostd

Do you see a /usr/lib/vmware/hostd/vmware-hostd running?

SRuff · ‎08-24-2007

Yeah, everything seems to be running

\# service --status-all|grep running

cmanicd (pid 1988) is running...

crond (pid 1165) is running...

gpm (pid 1046) is running...

hpasmd is running...

cmathreshd is running...

cmahostd is running...

cmapeerd is running...

cmastdeqd is running...

cmahealthd is running...

cmaperfd is running...

cmaeventd is running...

cmaidad is running...

cmafcad is running...

cmaided is running...

cmasm2d is running...

hpsmhd (pid 1646 1119) is running...

vmware-hostd (pid 6419) is running...

ntpd (pid 1037) is running...

cimserver (pid 2724) is running...

Ramchecker is not running

snmpd (pid 939) is running...

sshd (pid 3522 948) is running...

syslogd (pid 906) is running...

klogd (pid 910) is running...

At least one virtual machine is still running.

VMware VMkernel authorization daemon is running (pid 23582).

vmware-vpxa is running

webAccess (pid 1144) is running...

xinetd (pid 1081) is running...

\# ps -ef|grep vmware-hostd

root 6414 1 0 15:25 pts/0 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/hostd-support /usr/sbin/vmware-hostd -u -a

root 6419 6414 0 15:25 ? 00:00:02 /usr/lib/vmware/hostd/vmware-hostd /etc/vmware/hostd/config.xml -u -a

darren_boyd · ‎08-24-2007

Have you tried deleting the vpxuser and then trying to reconnect to VC?

SRuff · ‎08-25-2007

Yeah, then I just get bad username or password when trying to connect to the ESX host.

darren_boyd · ‎08-25-2007

now try restarting the initial services as per your earlier post:

vmware-vpxa

mgmt-vmware

xinetd

vmware-vmkauthd

SRuff · ‎08-25-2007

Well I rebooted the server as support suggested and am still getting the same error. The good thing is that during the reboot, HA kicked in and moved all of my VM"s which were shutdown, to other hosts in the cluster and started them up. So at least I'm able to manage the VM's now, I just can't do anything with this host.

SRuff · ‎08-25-2007

Finally, followed Darren's instructions and removed the host from Virtual Center, deleted the vpxuser, restarted all of the services and reconnected it to VC, and it's up and functioning now. Would still like to know what happened, as it's happened to another host in my cluster now, but I'm glad to have it up. Thanks for all the help, wonder why VMware support to didn't come up with this solution...

Jim · ‎03-11-2008

Just got of the phone with a support engineer trying to troubleshoot the exact same issues after upgrading from VC 2.0.1 to 2.5. One host, of roughly 15, wouldn't install the new VC client correctly. As a result, I ended up with the 514 error condition as well. I had already performed all the steps mentioned prior to submitting a support case. Even a reboot of the ESX host didn't break things free.

Seems as though if your ESX host file doesn't (for reasons that simply escape explanation at this point) have an entry for locahost that goes to loopback (as it should) THEN the authd process can't provide proper credentials and the client upgrade package can't install. Check your host(s) prior to upgrade. You host file should look like:

127.0.0.1 localhost.localdomain localhost

You shouldn't look like :

127.0.0.1 <hostname.something.xyz> <hostname>

Enjoy,

Jim

GVM · ‎04-23-2008

Folks

We had a similar problem in our ESX 3.0.1 environment

hostd would start and stop ; same error as displayed above basically.

The cause was a failure half way through patching the server up to 77862 - we managed to get it back up manually patching from the command line using ESXUPDATE the patch ESX-1003508 to fix our problem.

Post that we've re-run all the patches and all is well again

Hope this helps someone :smileygrin:

erickmiller · ‎05-05-2008

Hi,

We just had a similar issue with the authd errors with one of our 3.0.2 ESX clusters. We ended up having a SAN controller lock-up on us, so some of our LUNs were unavailable. This tends to cause a lot of havoc in hostd and some of the other services as they initialize since one or more services look at all of the datastores connected to the host. The timeout is extremely long and seems to cause a complete failure when some of the LUNs are unavailable. If you wait long enough, we saw entries in the hostd log file indicating that it was attempting to connect to a datastore that was unavailable.

Eric K. Miller, Genesis Hosting Solutions, LLC

- Lease part of our ESX cluster!

Eric K. Miller, Genesis Hosting Solutions, LLC http://www.genesishosting.com/ - Lease part of our ESX cluster!

cougar694u · ‎10-13-2008

I had 4 hosts disconnected from my 6 node cluster this morning. One host reconnected with just a right click -> connect, but the other three didn't.

One of the remaining hosts reconnected after following the above steps (deleted vpxuser & restarting services), but still had two that wouldn't connect.

I uninstalled the vpx agent and that seems to work. Here are the steps I used to uninstall and let VC reinstall vpxa:

Check for vpxa version: rpm -qa |grep vpxa

You should see something like VMware-vpxa-2.5.0-104215, you'll use this later

Stop the VMware management service: service mgmt-vmware stop

Stop the vpx agent: /etc/init.d/vmware-vpxa stop

Uninstall vpx agent: rpm -e VMware-vpxa-2.5.0-104215

Expect the following, ‘warning: /etc/vmware/vpxa.cfg saved as /etc/vmware/vpxa.cfg.rpmsave'

Verify vpxa has uninstalled: rpm -qa |grep vpxa (or vpx just in case)

Start the VMware management service: service mgmt-vmware start

Now go back into VirtualCenter and remove the disconnected host and add the host. Initially, it may fail with bad username or password or another error, but try again and it should work.

~Luke http://thephuck.com

admin · ‎11-11-2008

Make sure your hosts can still see the SAN storage. I ran into this problem recently and the host had lost connection to the storage. The VMs looked like they were still running though. We rescanned the storage adapters and all was well after this. Most of the VMs needed to be rebooted as they must have blue screened and were sitting at the "no Operating System found" screen.

~harry

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

NCHvm · ‎11-20-2008

Sounds stupid, but we had this problem and realised the host was out of disc space. The HA agent updated and caused a load of core dumps, which had filled up the disc. I deleted everything in /var/core and it connected again straight away, so might be worth checking. I was getting all sorts of strange messages e.g.

Network copy failed for file.

C:\Program Files\VMware\Infrastructure\ VirtualCenter Server\upgrade\vpx-upgrade-esx-7-linux-119598

Failed to install the VirtualCenter Agent Service

Cannot connect to host 'unable to contact the specified host'

Unable to communicate with the remote host, since it is disconnected

Feel a bit thick for not spotting that straight away, but hope this helps someone!

Ed

raghun79 · ‎06-17-2011

I did the following and was able to connect to ESX server from VC:

#@$ ps -ef | grep vmware-hostd

root 9059 3324 0 14:10 pts/0 00:00:00 grep vmware-hostd

root 18399 1 0 13:53 pts/0 00:00:00 /bin/sh /usr/bin/vmware-watchdog -s hostd -u 60 -q 5 -c /usr/sbin/vmware-hostd-support /usr/sbin/vmware-hostd /etc/vmware/hostd/config.xml -u

root 18412 18399 1 13:53 ? 00:00:14 /usr/lib/vmware/bin/vmware-hostd /etc/vmware/hostd/config.xml -u

#@$ kill -9 18399

#@$ ps -ef | grep vmware-hostd

root 9185 3324 0 14:11 pts/0 00:00:00 grep vmware-hostd

root 18412 1 1 13:53 ? 00:00:14 /usr/lib/vmware/bin/vmware-hostd /etc/vmware/hostd/config.xml -u

All

Can't connect to ESX host from VC or VIC