VMware Cloud Community
Raze
Contributor
Contributor

Host is disconnected...

Hi,

Came into work this morning to find that one of my esx hosts is appearing as disconnected in Virtual Centre. And all the VM's connected to this host are also showing as disconnected. I can't log into the host using the VI Client, error is "vmware infrastructure client could not establish the initial connection with server "ip address" "Details: A connection failure occured"

I can Remote Desktop onto all the VM's showing as disconnected and they seem to be working fine.

I can also ping the esx host and it replies, and when I control the esx host remotely via rsa it appears as normal showing me the ip address.

I have another esx host in the cluster and this is working fine.

When I check the events on the host that isn't working there is an error showing at 5:30am this morning that says Host "hostname" in "Datacenter Name" is not responding.

I've tried restarting the Virtual Centre Service and the host showed up as connected for about 30 seconds then went to back to disconnected.

ESX Server 3.0.2, Virtual Center 2.5.0

Any ideas on how to resolve this would be greatly appreciated. I don't have in-depth knowledge on vmware so please go easy on me!!

0 Kudos
33 Replies
heynakin
Contributor
Contributor

I have been fighting this for a week. Make sure your var/log is not full. Mine was filling up and this caused me to stay in a disconnected state.

0 Kudos
heynakin
Contributor
Contributor

try this:

1st

service vmware-vpxa stop

2nd

service mgmt-vmware stop

3rd

service vmware-vpxa restart

4th

service mgmt-vmware restart

0 Kudos
Karunakar
Hot Shot
Hot Shot

Pleae try to do as given below, this an example from my machine.

  1. rpm -qa | grep vpx

VMware-vpxa-2.0.1-32042

  1. rpm -e VMware-vpxa-2.0.1-32042

Also make sure that there is a folder called vmware-root in /tmp

-Karunakar

0 Kudos
gtyrer
Contributor
Contributor

I would also look at space.

Log into the console run "df -h" this will show you the mount

point usage - have a look at the Avail column

If any show 0 available then change to that mount point and run "du

-sh *" this will show folder/file usage, keep moving down the folder structure to

find the largest file.

I had a similar problem with HA, when I enabled it in VC the

/var/log/vmware/aam/<hostname> _agent.out log on the ESX host grew to gigs in size and stopped the console from responding. I deleted the fil, reconfigured for HA and the host responded agian.

0 Kudos
Raze
Contributor
Contributor

Strange things seem to be happening here, yesterday evening the host reconnected after I right-clicked it and selected connect. But when I clicked the summary tab the CPU and memory usage was showing as 0. I clicked on the virtual machines tab and all vm's were showing as being powered on but host cpu and memory were also showing as 0.

I had to leave for the evening so I just left it like that, came back in this morning and all hosts are connected and showing normal cpu and memory usage figures. And the summary is showing normal figures as well.

I've tried doing a df -h and the available columns are showing 2.9G, 65M, 131M and 1.8G.

If I remove the host from the inventory and then re-add it using the Add host option what happens to the virtual machines that was on that host? Shall I migrate the vm's to the other esx server while I remove and re-add the faulty one?

Even though it is working now I'm not sure what caused the error in the first place or what actually fixed it so I'd still like to find out the cause of the problem in case it happens again.

Thanks

0 Kudos
Rohail2004
Enthusiast
Enthusiast

Yes, VM's will be running as usual if you remove from the Inventory.. You do not need to migrate VM's. Remember your VMs were running as usual as well when your host was showing "disconnected"

0 Kudos
Raze
Contributor
Contributor

It says I need to put the host into Maintenance Mode first. I do this and It says "it will cause the cluster to violate its configured failover level for HA." I click yes then goes onto say that "a host in maintenance mode doesn't perform any vm related functions etc. to complete entry into maintenance mode all VM's must be shut down or moved to another host. Manual intervention may be required"

And below this there is a box ticked with text "Evacuate powered off and suspended virtual machines"

So this is suggesting I need to shut down the vm's isn' t it?

0 Kudos
Rohail2004
Enthusiast
Enthusiast

The first message is normal, but i never seen the second message. matter of fact I just removed one of my host under the cluster and I got the first message, but did not receive the message which you are getting that the host has to be in maintenance mode. I was able to add my host again w/o any issue. I don't know what is happening with your system.

0 Kudos
gtyrer
Contributor
Contributor

If a host is a member of a cluster that has HA enabled then you have to put it into maint mode before you can remove it, this is because the esx hosts monitor one another for HA to work and will try to power on the vm's that it thinks have failed.

What about first disabling HA on the cluster and then removing the host - just a thought.

0 Kudos
Karunakar
Hot Shot
Hot Shot

Hi,

Yes, as the Machine is in HA cluster, you have to put the machine in maintenence mode.

That would involve again in a process of migrating the VM to other.

As, you are able to see the ESX in the inventory of the VC server, now try to perform a mgmt-vmware restart again.

This may refresh the VC inventory, if possible you can also resart vc server.

-Karunakar

0 Kudos
Rohail2004
Enthusiast
Enthusiast

Yes, as the Machine is in HA cluster, you have to put the machine in maintenence mode.

0 Kudos
Rohail2004
Enthusiast
Enthusiast

sorry for the last post, I accidently hit submit before finishing my thought.

Karun, you said Yes, as the Machine is in HA cluster, you have to put the machine in maintenence mode... so let me ask you this, what if VC crashed so the VMs running in a cluster will not work? that dont make sense.. so you are saying if the machine is in HA cluster and you remove it w/o put the machine in manin mode so those VMs will not work?

0 Kudos
Karunakar
Hot Shot
Hot Shot

Hi Rohail,

Up to my understanding in the classic ESX , HA will continue to run in case of a VC crash. As HA is integral part of ESX, and is only initiated and configured from VC.

For more discussion on this please follow the below link

http://communities.vmware.com/thread/43477

I asked him to put in to maintenence mode, so that the HA cluster configuration is properly confogured when he wants to add it back, and inorder to do that, we have to put that machine in Maintenence mode.

And for this to happen, the machine should not have any VM's powered on.

-Karunakar

0 Kudos
Raze
Contributor
Contributor

I'll have to move all of the vm's over to the other esx server and remove the faulty one at the weekend, don't want to chance doing it during working hours. Plus I migrated some of the vm's already and the memory usage has gone up quite a bit on the host that is functioning normally so I don't want to put further strain on it now.

Will try and do it this weekend if I get a chance and let you know how it goes.

0 Kudos