"unable to access the specified host, It either does not exist, the server software is not responding, or there is a network problem"
Ive got a VC server with no firewall in between it and the esx hosts (all of them are showing disconnected)and all on the same segment, If i telnet to to the esx host from the windows box using telnet servername 902 it connects ok. But if i try to add the host it fails with the message above.
Ive verified there is no additional file backed up in /etc/vmware/firewall which can cause this issue.
If i try to add the host with an incorrect user and pass, it actually tells me that the password is incorrect, if i try adding using the root account i get the error above.
Windows firewall is disabled on the VC server
esxcfg-vswif -l gives me output showing the correct IP bound to vswif0
Name Port Group IP Address Netmask Broadcast Enabled DHCP
vswif0 Service Console 192.168.0.21 255.255.255.0 192.168.0.255 true false
Anyone have an suggestions on what i should check next ?
Hi can you let me know how you narrowed it down to VPXuser ?
Ive narrowed my problem down to the services.xml file being altered and then causing the host to loose connection once the mgmt-vmware restart was issued. Ive been able to replicate this and realised the format I added to the file caused this to occur, the trouble is once I brought back the backup copy of /etc/vmware/firewall/services.xml back to the original location and rebooted, the server was still broken. I flushed the IPTABLES using iptables f but this didnt work as I was hoping it would. Due to time I had to get the system back up and running so I just rebuilt the system choosing to keep the vmfs partitions, maybe over Christmas ill get a chance to break it again and try to figure out how to fix it and whats actually occurring. This is all covered under my other post here.
http://www.vmware.com/community/thread.jspa?threadID=65226&tstart=60
To delete a user from the console you can use deluser
I found that if I was connected to the VC server AND the host using two separate VI Clients I saw an error on the Host basically saying there was a duplicate key, specified user already exists.
Ok here is where I am now. I found that removing the VPXUser account (the command is userdel not deluser for ESX3) allows the VC server to re-create the account when I attempt to add the host and I do not get the error above. The VC Server still returns the "Unable to access the specified host" error in the VI Client. I am trying to look at the services.xml file but again I am a Linux newb so I am unfamiliar with the ESX command line and commands. I really should get a "Linux for Dummies" type book. Just FYI I had come accross your post about the services.xml file before posting to this thread about my problem, but at the time the user account issue seemed more relevant due to the error I was seeing. Also note that throughout all of my problems here I have never had a problem connecting to any of my Hosts using the VI Client or the Web interfaces.
Message was edited by:
JonT
Oh, and also when I remove the "disconnected" object from the VC server's Inventory, the Host records the "Remove Entity Permission" as successful twice. How do I crack open that services.xml file to check it? I have been trying to login to my host using both PuTTY and Secure CRT but it will not authenticate me with my root account/password for some reason.
Any ideas?
Hi. Firstly, sorry my typo on the command to delete the account.
Esx3 is locked down so you cant putty in as root, but you can then use su - and supply password for super user. Or you can goto the console and use the following command vi /etc/ssh/sshd_config and go down and edit the line "PermitRootLogin no" to "PermitRootLogin yes" and then issue the command "service sshd restart" you should now be able to login via putty using the root account.
To be honest, if you have not altered the services.xml file previously i doubt this will be the problem.
Ok well I am almost to a dead end on this, but I will build another fresh ESX3 Host and compare it to the disconnected host. My only worry is that if we run into this situation in production we could be hurting. Basically what started all of this was a power outage to the Lab (Seattle Winds last week).
I found it easier to just rebuild after spending many hours trying to work it out... on the esx host that wont connect can you run the command "iptables -L" and post the output here so i can compare to mine thats working after the rebuild.
Ok, like you I am going to give up and rebuild. The one thing to note here is that I was able to add and remove my newly built host several times, but to replicate I shut it down and "removed" the host from VC. That seems to be the trigger to perminently disconnect a host. Here is the output from the "iptables" command you asked me to run on the disconnected Host (quite long actually):
\[root@waprdbrvmwh01 root]# iptables -L
Chain INPUT (policy DROP)
target prot opt source destination
ACCEPT all -- anywhere anywhere
valid-tcp-flags tcp -- anywhere anywhere
valid-source-address !udp -- anywhere anywhere
valid-source-address-udp udp -- anywhere anywhere
valid-source-address tcp -- anywhere anywhere tcp flags
:SYN,RST,ACK/SYN
icmp-in icmp -- anywhere anywhere
ACCEPT all -- anywhere anywhere state RELATED,ESTABL
ISHED
ACCEPT tcp -- anywhere anywhere tcp dpt:vmware-authd
state NEW
ACCEPT tcp -- anywhere anywhere tcp dpt:http state N
EW
ACCEPT tcp -- anywhere anywhere tcp dpt:https state
NEW
ACCEPT udp -- anywhere anywhere udp spts:bootps:boot
pc dpts:bootps:bootpc
ACCEPT udp -- anywhere anywhere udp dpt:svrloc
ACCEPT tcp -- anywhere anywhere tcp dpt:svrloc state
NEW
ACCEPT tcp -- anywhere anywhere tcp dpt:5989 state N
EW
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh state NE
W
ACCEPT tcp -- anywhere anywhere tcp dpt:5988 state N
EW
Chain FORWARD (policy DROP)
target prot opt source destination
Chain OUTPUT (policy DROP)
target prot opt source destination
ACCEPT all -- anywhere anywhere
valid-tcp-flags tcp -- anywhere anywhere
icmp-out icmp -- anywhere anywhere
ACCEPT udp -- anywhere anywhere udp spts:1024:65535
dpt:domain
ACCEPT all -- anywhere anywhere state RELATED,ESTABL
ISHED
ACCEPT tcp -- anywhere anywhere tcp dpt:vmware-authd
state NEW
ACCEPT udp -- anywhere anywhere udp spts:bootps:boot
pc dpts:bootps:bootpc
ACCEPT udp -- anywhere anywhere udp spt:svrloc
ACCEPT tcp -- anywhere anywhere tcp spt:svrloc state
NEW
ACCEPT udp -- anywhere anywhere udp dpt:902 state NE
W
ACCEPT tcp -- anywhere anywhere tcp dpt:27000 state
NEW
ACCEPT tcp -- anywhere anywhere tcp dpt:27010 state
NEW
REJECT all -- anywhere anywhere reject-with icmp-por
t-unreachable
Chain icmp-in (1 references)
target prot opt source destination
ACCEPT icmp -- anywhere anywhere icmp echo-reply
ACCEPT icmp -- anywhere anywhere icmp echo-request
ACCEPT icmp -- anywhere anywhere icmp fragmentation-n
eeded
DROP all -- anywhere anywhere
Chain icmp-out (1 references)
target prot opt source destination
ACCEPT icmp -- anywhere anywhere icmp echo-request
ACCEPT icmp -- anywhere anywhere icmp echo-reply
DROP all -- anywhere anywhere
Chain log-and-drop (7 references)
target prot opt source destination
LOG all -- anywhere anywhere LOG level debug tcp-
options ip-options
DROP all -- anywhere anywhere
Chain valid-source-address (2 references)
target prot opt source destination
DROP all -- localhost.localdomain anywhere
DROP all -- 0.0.0.0/8 anywhere
DROP all -- anywhere 255.255.255.255
Chain valid-source-address-udp (1 references)
target prot opt source destination
DROP all -- localhost.localdomain anywhere
DROP all -- 0.0.0.0/8 anywhere
Chain valid-tcp-flags (2 references)
target prot opt source destination
log-and-drop tcp -- anywhere anywhere tcp flags:FIN,SYN
,RST,PSH,ACK,URG/NONE
log-and-drop tcp -- anywhere anywhere tcp flags:FIN,ACK
/FIN
log-and-drop tcp -- anywhere anywhere tcp flags:PSH,ACK
/PSH
log-and-drop tcp -- anywhere anywhere tcp flags:ACK,URG
/URG
log-and-drop tcp -- anywhere anywhere tcp flags:FIN,SYN
/FIN,SYN
log-and-drop tcp -- anywhere anywhere tcp flags:SYN,RST
/SYN,RST
log-and-drop tcp -- anywhere anywhere tcp flags:FIN,RST
/FIN,RST
I did my own comparison between connected and disconnected. The disconnected is missing the following under "CHAIN Input" and "CHAIN Output":
ACCEPT tcp -- anywhere anywhere tcp dpts:2050:5000 state NEW
ACCEPT udp -- anywhere anywhere udp dpts:2050:5000 state NEW
ACCEPT tcp -- anywhere anywhere tcp dpts:8042:8045 state NEW
ACCEPT udp -- anywhere anywhere udp dpts:8042:8045 state NEW
Of course I got ahead of myself. The ports missing belong to the
"EMC AAM Client".
Somehow I don't think this is the problem. Oh well I will leave one "disconnected" for futher analysis but the others are getting a rebuild.
Thanks for the suggestions.
anythoughts around just reinstalling the VPX agent copied from "\program files\vmware\Virtual center 2\upgrade" (or similar it's late here)
check the XML in the folder to see what version you need.
SCP it to the affected hosts
log on to the host su - to root
then sh "file name from the XML file at location you copied it to"
might work/might not, won't matter to test if you're thinking or rebuilding anyways.
I rebuilt the affected hosts. Took me about 20 minutes to get 3 hosts back into my VC with the same VM's still available (on an MSA1000 shared). Lesson learned is that while this problem should be resolved, rebuilding is much faster than troubleshooting.
OK, just like the rest of you I am seeing this error. Accept I can't add my ESX Host at all. I can't even get to it with VIClient.
Here's the kicker - after a bunch of troubleshooting, including digging around in /var/log/* files, reading all kinds of logs, doing chkconfig --lists and digging through directories I find no mention of vpxa. I think I might be missing that module completely.
My boss just ran the upgrade yesterday but I wasn't there to "watch over his shoulder" so I don't know which error messages, if any, came up. I do know he ran the upgrade twice.
To pre-empt some questions -
Its an IBM rackmount VM ESX 3.0 Host (small implementation < 10 VM's)
same subnet and physical net segment as WinXPP VC / License Server
static IP (verified through ifconfig as correct IP on correct NIC)
IPTABLES and Firewall are empty on ESX / Windows Firewall is disabled
PuTTy and Telnet to 902 connect just fine
DNS resolves (using IP to attempt adding it didn't help)
using invalid username/pass gives authentication error
using valid username/pass gives the topic error
I've even generated new/combined license files and verified them on the license server.
Any and all ideas would be greatly appreciated.
SKrehlik,
All I can tell you is that the error "Unable to access the specified host" when you attempt to add it to VC seems to be kindof the VMWare "general error". The VPXA should actually be a running "service" during and after you add the host to Virtual Center. The VPXA is the Virtual Center "agent" that is installed during the addition to VC. If you do not see this running on the Host, then you need to troubleshoot that first. That may be your best lead. One question for you, was this host added to a Virtual Center server prior to the upgrade or are you just now trying to add it to VC? If it was and somehow got lost on the VC console, you may not have a choice but to rebuild the host. I never found the solution as to why my VC couldn't connect to my ESX3 host. I found that I spent about 3 whole work days trying to troubleshoot the problem and it took me about an hour to rebuild the hosts, add my VM's back, and finish connecting to VC.
Jon
One other thing I just thought of, what version of Virtual Center are you using? Have you updated to 2.0 or are you still using a version of 1.x? All of the testing I did was with 2.0 and ESX3, so I had no updates involved. There may be a compatibitility issue with moving a host that was setup on a 1.x VC to a "new" 2.0 VC.
Hope this helps.....
Jon,
thanks for the reply. We were running ESX 2.5 as a stand alone without VC using the web interface to manage it.
First, we installed Virtual Center 2.0 on the VC server, added this system as the local license server, installed our licenses, and loaded Virtual Client. Then we upgraded ESX to 3.0.1 (after taking some backups).
So, in essence, we never had Virtual Center running and our ESX host was never a part of a virtual center environment.
Ok so you should be able to connect this host after the update. You stated that you cannot connect via the VI Client but to the host or the VC server? You should be able to connect to both via the VIC. Can you bring up the web page for the host? It should direct you to download the VI client but does the page even come up? I also hope you have done the simple network tests to your host to verify connectivity. Also you may need to check on the console of the host to verify that the ports used by VI Client are not being blocked by the ESX firewall. I am not sure what takes place under the hood during an "upgrade" but I know that in 2.x the management ports are different than with 3.0.1 for the VIC.
I have had this issue also with a HP BL25p G1 with 2 x AMD Opteron processors. However, I have managed to get the system back working without a re-build.
I upgraded a farm of five servers and four worked great. I worked through this post for things to try and eventually tried a 'ps -ea | grep vm' on a working and the non-working blade. The difference between them was a process called 'vmware-hostd'. I logged on via SSH to both blades and ran the command. The working blade sat there and waited like a good server process, however, the blade which did not work caused a seg fault.
After looking in the contents of /usr/sbin/vmware-hostd I went to the library directory of /usr/lib/vmware/hostd and compared by file size the directories on both blades.
I decided then to replace the non-working files with a copy of the working files. Then, running 'vmware-hostd' no longer caused a seg fault and reloaded the blade. Everything came back as was.
I haven't found which particular file caused the problem yet, but if you are deperate to get a system back up and running, this may point you in the right direction.