VMware Cloud Community
radman
Enthusiast
Enthusiast

Attempting to Clone VM results in "Failed to connect to host"

A while back I changed the IP address of my ESX server COS and all the VMs to a new subnet.

Everything is working fine, except that when I try to Clone a VM, it fails with "Failed to connect to host".

I saw a post that suggested editing /etc/opt/vmware/vpxa/vpxa.conf on the COS, and sure enough my old address is in there under hostIP.

I can change the address, but it doesn't help. I've tried disconnecting/reconnecting the ESX server from the VC, but when I do this the old, bad hostIP keeps showing up again.

I've tried disconnecting the server, shutting down the VC VM, putting the server into maintenance mode, then editing the file, then exiting maintenance mode restarting the VC VM, and reconnecting, but same result.

I don't have a cluster - this is a standalone server, where VC is running within a VM on that server.

Help, anyone?

Reply
0 Kudos
4 Replies
radman
Enthusiast
Enthusiast

I figured it out. The trick is to remove the ESX server from the Datacenter in VC entirely, then edit the file, then add it back in. Life is good now.

Reply
0 Kudos
BenConrad
Expert
Expert

Alternatively (This way you don't lose historical data)

1 Stop vCenter

2 stop vmware-vpxa

3 edit /etc/opt/vmware/vpxa/vpxa.cfg

4 in the VCDB, edit VPX_HOST and change the IP to the proper IP address

5. Start vCenter

6. Start vmware-vpxa

HTH.

Reply
0 Kudos
lamw
Community Manager
Community Manager

I’ve recently seen this exact issue as well, the problem is as you’ve described, re-iping the Service Console IP Address while the ESX host is attached to vCenter. I don’t think this is such an unheard of operation and can be done within a cluster so long as you take proper steps such as putting the host into Maintenance Mode and properly reconfiguring DNS. When you first add a hot to vCenter, vpxa (vCenter agent) is pushed to your ESX host and configured and it retrieves the Service Console IP and writes out a config to /etc/opt/vmware/vpxa/vpxa.cfg which will contain both your SC IP and vCenter IP:

<hostIp>A.B.C.D</hostIp> # ESX Service Console IP

<serverIp>E.F.G.H</serverIp> # vCenter IP

This information is then also written to the vCenter VCDB in VPX_HOST table under DNS_NAME and IP_ADRESSS

The problem occurs when you try to re-ip the Service Console while it’s still connected:

1) Put host in maintenance mode

2) Update your DNS, so that your hostname will resolve to the new IP and vice-a-versa

3) Log directly to the console (iLO or DRAC) and re-ip by using esxcfg-vswif

4) Host will probably disconnect while this new update occurs and the changes are flushed out

5) vCenter will then see the host and reconnect OR you can manually re-connect

6) vCenter will say, hey everything is good again and welcome back HostX

Everything will seem to be okay, vMotion continue to work, HA continues to function and full DNS resolution both (forward/reverse) will work from both vCenter Server and ESX host.

Unfortunately, everything is not Sunny in Philadelphia … if you perform this operation while it’s still connected to vCenter, for some unknown reason, vpxa will not push the new updates to vCenter. This isn’t such a big deal right? ….. well it is if you’re trying to clone a VM whether that be from a template or another VM. The exact error message you’ll see from the task is quite descriptive … ‘Cannot connect to host’. Cloning even fails within a host that was managing the VM you're trying to clone from/to whether it was on local VMFS storage or on FC SAN VMFS or NFS datastore!

After some digging with support, we’ve found there were some NFC errors but it still did not pin point the exact issue and finally we came to conclusion that it had to be vCenter, because it was proxying this operation and during some part of the initialization, it fails since the operation never makes it to the destination host when trying to clone and it's specifically looking up the hostname using it's internal DB.

We finally took a look at /etc/opt/vmware/vpxa/vpxa.cfg and greped out hostIp which contains the Service Console IP Address and long and behold, it was set to the old Service Console IP Address and not the current one as it was configured to. We did not want to lose performance data from our vCenter, so we tried various things to see if we could resolve the issue without having to remove the host from vCenter (which I had a feeling would have fixed the problem, but we wanted to identify the root cause)

1) Disconnect host from vCenter, stop vpxa service (service vmware-vpxa stop), remove vpxa.cfg, restart vpxa service which will then reconnect in vCenter and hopefully it’ll update the correct Service Console IP, this failed and still hard coded the old IP within the configuration upon restoration of the config

2) Disconnect host from vCenter, stop vpxa service (service vmware-vpxa stop), remove vpxa.cfg, reconnect host in vCenter using vSphere client and hopefully it’ll update the correct Service Console IP, this also failed

Both of these were suggestions because once the vpxa service stops and you try to reconnect, vCenter will broker a connecton with your host and forcefully send down the vpxa agent and re-update your host and with that, we were hoping the configuration would be re-picked up and updated to the VCDB and unfortunately that was not the case.

It started to sound like the VCDB was over-writing any value for the Service Console IP because it had existed in it’s DB and it’s sort of proxying this self-DNS information versus relying on the host or grabbing the latest change on the host if any occurred.

If you execute the following SQL query on a specific host, you’ll notice the Service Console IP Address is set to an older addresss:

select DNS_NAME, IP_ADDRESS from VPX_HOST where DNS_NAME like '%some_esx_host%'

At this time, it was clear on what was going on and the resolution were the following:

1) Manually update the VCDB with the proper Service Console IP Address, I haven’t muxed with the VCDB for over 2yrs and with vSphere, there were significant schema changes and support did not have the query in hand and I was not comfortable just updating this table, who knows what other tables/triggers it could affect and you probably would want to consult with VMware Support.

2) We remove the host from vCenter, re-add and that definitely would have solved the problem (which was verified in our development environment)

I decided that it wasn’t worth the risk to manually update tables of your VCDB (if you do, make sure you have your DBA’s take a backup) and we removed the hosts that were affected and re-added the host to ensure proper configurations in both the ESX and vCenter server.

Long story short, vCenter is not smart enough to handle a Service Console IP Address change on an ESX Server while still being connected to vCenter. I don’t think this is an unheard of task and I’m sure this operation is perform for many reasons and the assumption that is being made is your ESX Service Console will never change once you add it to vCenter, which is a pretty bad assumption…I’m hoping it’s just a bug or miss-written logic. You could argue that you should remove the host from vCenter, but if you put it in Maintenance mode, nothing is vMotioning to it and plus you won't lose any performance/statistical data which is pretty important to us and I'm sure to others as well.

I also wrote a small script to detect this type of an inconsistency and you can execute this script locally on each ESX host or store it on shared storage and kick off a quick for loop to SSH into each ESX host and verify that you do have a consistent environment regarding the ESX SC IP Address and what the vCenter VCDB ‘thinks’ the ESX SC IP is.

Here is an example of how it works:

Good:

[root@simplejack.primp-industries.com ~]# /vmfs/volumes/dlgCore-NFS/verifyVpxaAndHostConf.sh
"simplejack.primp-industries.com" Service Console IP is consistent with vCenter!

Bad:

[root@speedman.primp-industries.com ~]# /vmfs/volumes/dlgCore-NFS/verifyVpxaAndHostConf.sh

ERROR: "speedman.primp-industries.com" Service Console IP is NOT consistent with vCenter
        ESX Service Console IP: 172.30.0.100
        vCenter ESX Service Console IP stored: 172.30.0.200

If you find some inconsistencies, I highly recommend remediating this as soon as you can or you could run into this when trying to deploy a clone from a VM/Template.

=========================================================================

William Lam

VMware vExpert 2009

VMware ESX/ESXi scripts and resources at:

VMware Code Central - Scripts/Sample code for Developers and Administrators

If you find this information useful, please award points for "correct" or "helpful".

Reply
0 Kudos
ferreepa
Contributor
Contributor

This post was very helpful. I changed the IP address on the Service Console during our vmware vSphere upgrade and sure enough the VCDB never reflected the change. i put the 2 hosts that i change the IP Address on in Maintenance mode and removed them from the VCenter and then added them back in and the DB as well as the VPXA_Config file were both updated

Reply
0 Kudos