VMware Cloud Community
76dragon
Enthusiast
Enthusiast

can not add esx host to VC "Unable to access the specified host"

"unable to access the specified host, It either does not exist, the server software is not responding, or there is a network problem"

Ive got a VC server with no firewall in between it and the esx hosts (all of them are showing disconnected)and all on the same segment, If i telnet to to the esx host from the windows box using telnet servername 902 it connects ok. But if i try to add the host it fails with the message above.

Ive verified there is no additional file backed up in /etc/vmware/firewall which can cause this issue.

If i try to add the host with an incorrect user and pass, it actually tells me that the password is incorrect, if i try adding using the root account i get the error above.

Windows firewall is disabled on the VC server

esxcfg-vswif -l gives me output showing the correct IP bound to vswif0

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 192.168.0.21 255.255.255.0 192.168.0.255 true false

Anyone have an suggestions on what i should check next ?

0 Kudos
36 Replies
76dragon
Enthusiast
Enthusiast

Hi can you let me know how you narrowed it down to VPXuser ?

Ive narrowed my problem down to the services.xml file being altered and then causing the host to loose connection once the mgmt-vmware restart was issued. Ive been able to replicate this and realised the format I added to the file caused this to occur, the trouble is once I brought back the backup copy of /etc/vmware/firewall/services.xml back to the original location and rebooted, the server was still broken. I flushed the IPTABLES using iptables –f but this didn’t work as I was hoping it would. Due to time I had to get the system back up and running so I just rebuilt the system choosing to keep the vmfs partitions, maybe over Christmas ill get a chance to break it again and try to figure out how to fix it and what’s actually occurring. This is all covered under my other post here.

http://www.vmware.com/community/thread.jspa?threadID=65226&tstart=60

To delete a user from the console you can use deluser

0 Kudos
JonT
Enthusiast
Enthusiast

I found that if I was connected to the VC server AND the host using two separate VI Clients I saw an error on the Host basically saying there was a duplicate key, specified user already exists.

Ok here is where I am now. I found that removing the VPXUser account (the command is userdel not deluser for ESX3) allows the VC server to re-create the account when I attempt to add the host and I do not get the error above. The VC Server still returns the "Unable to access the specified host" error in the VI Client. I am trying to look at the services.xml file but again I am a Linux newb so I am unfamiliar with the ESX command line and commands. I really should get a "Linux for Dummies" type book. Just FYI I had come accross your post about the services.xml file before posting to this thread about my problem, but at the time the user account issue seemed more relevant due to the error I was seeing. Also note that throughout all of my problems here I have never had a problem connecting to any of my Hosts using the VI Client or the Web interfaces.

Message was edited by:

JonT

0 Kudos
JonT
Enthusiast
Enthusiast

Oh, and also when I remove the "disconnected" object from the VC server's Inventory, the Host records the "Remove Entity Permission" as successful twice. How do I crack open that services.xml file to check it? I have been trying to login to my host using both PuTTY and Secure CRT but it will not authenticate me with my root account/password for some reason.

Any ideas?

0 Kudos
76dragon
Enthusiast
Enthusiast

Hi. Firstly, sorry my typo on the command to delete the account.

Esx3 is locked down so you cant putty in as root, but you can then use su - and supply password for super user. Or you can goto the console and use the following command vi /etc/ssh/sshd_config and go down and edit the line "PermitRootLogin no" to "PermitRootLogin yes" and then issue the command "service sshd restart" you should now be able to login via putty using the root account.

To be honest, if you have not altered the services.xml file previously i doubt this will be the problem.

0 Kudos
JonT
Enthusiast
Enthusiast

Ok well I am almost to a dead end on this, but I will build another fresh ESX3 Host and compare it to the disconnected host. My only worry is that if we run into this situation in production we could be hurting. Basically what started all of this was a power outage to the Lab (Seattle Winds last week).

0 Kudos
76dragon
Enthusiast
Enthusiast

I found it easier to just rebuild after spending many hours trying to work it out... on the esx host that wont connect can you run the command "iptables -L" and post the output here so i can compare to mine thats working after the rebuild.

0 Kudos
JonT
Enthusiast
Enthusiast

Ok, like you I am going to give up and rebuild. The one thing to note here is that I was able to add and remove my newly built host several times, but to replicate I shut it down and "removed" the host from VC. That seems to be the trigger to perminently disconnect a host. Here is the output from the "iptables" command you asked me to run on the disconnected Host (quite long actually):

\[root@waprdbrvmwh01 root]# iptables -L

Chain INPUT (policy DROP)

target prot opt source destination

ACCEPT all -- anywhere anywhere

valid-tcp-flags tcp -- anywhere anywhere

valid-source-address !udp -- anywhere anywhere

valid-source-address-udp udp -- anywhere anywhere

valid-source-address tcp -- anywhere anywhere tcp flags

:SYN,RST,ACK/SYN

icmp-in icmp -- anywhere anywhere

ACCEPT all -- anywhere anywhere state RELATED,ESTABL

ISHED

ACCEPT tcp -- anywhere anywhere tcp dpt:vmware-authd

state NEW

ACCEPT tcp -- anywhere anywhere tcp dpt:http state N

EW

ACCEPT tcp -- anywhere anywhere tcp dpt:https state

NEW

ACCEPT udp -- anywhere anywhere udp spts:bootps:boot

pc dpts:bootps:bootpc

ACCEPT udp -- anywhere anywhere udp dpt:svrloc

ACCEPT tcp -- anywhere anywhere tcp dpt:svrloc state

NEW

ACCEPT tcp -- anywhere anywhere tcp dpt:5989 state N

EW

ACCEPT tcp -- anywhere anywhere tcp dpt:ssh state NE

W

ACCEPT tcp -- anywhere anywhere tcp dpt:5988 state N

EW

Chain FORWARD (policy DROP)

target prot opt source destination

Chain OUTPUT (policy DROP)

target prot opt source destination

ACCEPT all -- anywhere anywhere

valid-tcp-flags tcp -- anywhere anywhere

icmp-out icmp -- anywhere anywhere

ACCEPT udp -- anywhere anywhere udp spts:1024:65535

dpt:domain

ACCEPT all -- anywhere anywhere state RELATED,ESTABL

ISHED

ACCEPT tcp -- anywhere anywhere tcp dpt:vmware-authd

state NEW

ACCEPT udp -- anywhere anywhere udp spts:bootps:boot

pc dpts:bootps:bootpc

ACCEPT udp -- anywhere anywhere udp spt:svrloc

ACCEPT tcp -- anywhere anywhere tcp spt:svrloc state

NEW

ACCEPT udp -- anywhere anywhere udp dpt:902 state NE

W

ACCEPT tcp -- anywhere anywhere tcp dpt:27000 state

NEW

ACCEPT tcp -- anywhere anywhere tcp dpt:27010 state

NEW

REJECT all -- anywhere anywhere reject-with icmp-por

t-unreachable

Chain icmp-in (1 references)

target prot opt source destination

ACCEPT icmp -- anywhere anywhere icmp echo-reply

ACCEPT icmp -- anywhere anywhere icmp echo-request

ACCEPT icmp -- anywhere anywhere icmp fragmentation-n

eeded

DROP all -- anywhere anywhere

Chain icmp-out (1 references)

target prot opt source destination

ACCEPT icmp -- anywhere anywhere icmp echo-request

ACCEPT icmp -- anywhere anywhere icmp echo-reply

DROP all -- anywhere anywhere

Chain log-and-drop (7 references)

target prot opt source destination

LOG all -- anywhere anywhere LOG level debug tcp-

options ip-options

DROP all -- anywhere anywhere

Chain valid-source-address (2 references)

target prot opt source destination

DROP all -- localhost.localdomain anywhere

DROP all -- 0.0.0.0/8 anywhere

DROP all -- anywhere 255.255.255.255

Chain valid-source-address-udp (1 references)

target prot opt source destination

DROP all -- localhost.localdomain anywhere

DROP all -- 0.0.0.0/8 anywhere

Chain valid-tcp-flags (2 references)

target prot opt source destination

log-and-drop tcp -- anywhere anywhere tcp flags:FIN,SYN

,RST,PSH,ACK,URG/NONE

log-and-drop tcp -- anywhere anywhere tcp flags:FIN,ACK

/FIN

log-and-drop tcp -- anywhere anywhere tcp flags:PSH,ACK

/PSH

log-and-drop tcp -- anywhere anywhere tcp flags:ACK,URG

/URG

log-and-drop tcp -- anywhere anywhere tcp flags:FIN,SYN

/FIN,SYN

log-and-drop tcp -- anywhere anywhere tcp flags:SYN,RST

/SYN,RST

log-and-drop tcp -- anywhere anywhere tcp flags:FIN,RST

/FIN,RST

0 Kudos
JonT
Enthusiast
Enthusiast

I did my own comparison between connected and disconnected. The disconnected is missing the following under "CHAIN Input" and "CHAIN Output":

ACCEPT tcp -- anywhere anywhere tcp dpts:2050:5000 state NEW

ACCEPT udp -- anywhere anywhere udp dpts:2050:5000 state NEW

ACCEPT tcp -- anywhere anywhere tcp dpts:8042:8045 state NEW

ACCEPT udp -- anywhere anywhere udp dpts:8042:8045 state NEW

0 Kudos
JonT
Enthusiast
Enthusiast

Of course I got ahead of myself. The ports missing belong to the

"EMC AAM Client".

Somehow I don't think this is the problem. Oh well I will leave one "disconnected" for futher analysis but the others are getting a rebuild.

Thanks for the suggestions.

0 Kudos
conyards
Expert
Expert

anythoughts around just reinstalling the VPX agent copied from "\program files\vmware\Virtual center 2\upgrade" (or similar it's late here)

check the XML in the folder to see what version you need.

SCP it to the affected hosts

log on to the host su - to root

then sh "file name from the XML file at location you copied it to"

might work/might not, won't matter to test if you're thinking or rebuilding anyways.

https://virtual-simon.co.uk/
0 Kudos
JonT
Enthusiast
Enthusiast

I rebuilt the affected hosts. Took me about 20 minutes to get 3 hosts back into my VC with the same VM's still available (on an MSA1000 shared). Lesson learned is that while this problem should be resolved, rebuilding is much faster than troubleshooting.

0 Kudos
SKrehlik
Contributor
Contributor

OK, just like the rest of you I am seeing this error. Accept I can't add my ESX Host at all. I can't even get to it with VIClient.

Here's the kicker - after a bunch of troubleshooting, including digging around in /var/log/* files, reading all kinds of logs, doing chkconfig --lists and digging through directories I find no mention of vpxa. I think I might be missing that module completely.

My boss just ran the upgrade yesterday but I wasn't there to "watch over his shoulder" so I don't know which error messages, if any, came up. I do know he ran the upgrade twice.

To pre-empt some questions -

Its an IBM rackmount VM ESX 3.0 Host (small implementation < 10 VM's)

same subnet and physical net segment as WinXPP VC / License Server

static IP (verified through ifconfig as correct IP on correct NIC)

IPTABLES and Firewall are empty on ESX / Windows Firewall is disabled

PuTTy and Telnet to 902 connect just fine

DNS resolves (using IP to attempt adding it didn't help)

using invalid username/pass gives authentication error

using valid username/pass gives the topic error

I've even generated new/combined license files and verified them on the license server.

Any and all ideas would be greatly appreciated.

0 Kudos
JonT
Enthusiast
Enthusiast

SKrehlik,

All I can tell you is that the error "Unable to access the specified host" when you attempt to add it to VC seems to be kindof the VMWare "general error". The VPXA should actually be a running "service" during and after you add the host to Virtual Center. The VPXA is the Virtual Center "agent" that is installed during the addition to VC. If you do not see this running on the Host, then you need to troubleshoot that first. That may be your best lead. One question for you, was this host added to a Virtual Center server prior to the upgrade or are you just now trying to add it to VC? If it was and somehow got lost on the VC console, you may not have a choice but to rebuild the host. I never found the solution as to why my VC couldn't connect to my ESX3 host. I found that I spent about 3 whole work days trying to troubleshoot the problem and it took me about an hour to rebuild the hosts, add my VM's back, and finish connecting to VC.

Jon

0 Kudos
JonT
Enthusiast
Enthusiast

One other thing I just thought of, what version of Virtual Center are you using? Have you updated to 2.0 or are you still using a version of 1.x? All of the testing I did was with 2.0 and ESX3, so I had no updates involved. There may be a compatibitility issue with moving a host that was setup on a 1.x VC to a "new" 2.0 VC.

Hope this helps.....

0 Kudos
SKrehlik
Contributor
Contributor

Jon,

thanks for the reply. We were running ESX 2.5 as a stand alone without VC using the web interface to manage it.

First, we installed Virtual Center 2.0 on the VC server, added this system as the local license server, installed our licenses, and loaded Virtual Client. Then we upgraded ESX to 3.0.1 (after taking some backups).

So, in essence, we never had Virtual Center running and our ESX host was never a part of a virtual center environment.

0 Kudos
JonT
Enthusiast
Enthusiast

Ok so you should be able to connect this host after the update. You stated that you cannot connect via the VI Client but to the host or the VC server? You should be able to connect to both via the VIC. Can you bring up the web page for the host? It should direct you to download the VI client but does the page even come up? I also hope you have done the simple network tests to your host to verify connectivity. Also you may need to check on the console of the host to verify that the ports used by VI Client are not being blocked by the ESX firewall. I am not sure what takes place under the hood during an "upgrade" but I know that in 2.x the management ports are different than with 3.0.1 for the VIC.

0 Kudos
neilhdavies
Contributor
Contributor

I have had this issue also with a HP BL25p G1 with 2 x AMD Opteron processors. However, I have managed to get the system back working without a re-build.

I upgraded a farm of five servers and four worked great. I worked through this post for things to try and eventually tried a 'ps -ea | grep vm' on a working and the non-working blade. The difference between them was a process called 'vmware-hostd'. I logged on via SSH to both blades and ran the command. The working blade sat there and waited like a good server process, however, the blade which did not work caused a seg fault.

After looking in the contents of /usr/sbin/vmware-hostd I went to the library directory of /usr/lib/vmware/hostd and compared by file size the directories on both blades.

I decided then to replace the non-working files with a copy of the working files. Then, running 'vmware-hostd' no longer caused a seg fault and reloaded the blade. Everything came back as was.

I haven't found which particular file caused the problem yet, but if you are deperate to get a system back up and running, this may point you in the right direction.

0 Kudos