I've partially upgraded my two-node VI cluster - one node has upgraded to ESX 3.5, and my VirtualCentre server has upgraded to 2.5. The Database also upgraded successfully, after running the upgrade manually and tweaking the permissions on MSDB so that my vclogin user had DBO rights.
However, when I started the 2.5 VI Client, it attempted to upgrade the agent on the other, 3.0.2 ESX node, which hung in the process - it was stuck at 80% for 40 or so minutes (coffee break, walk around the garden, read paper-length interval
During this time, the VM's it was hosting stopped responding as well (I had Vmotion'd them to it so the other node could be upgraded). Reboot.
After the reboot, and kicking the SAN, the server came up and I could connect to it directly with the VI Client, and the VMs it is hosting work normally - this includes the VCMS management server! I could also connect the VI Client to the VCMS, but the VCMS says the node is disconnected. I have deleted the node from the VCMS, and attempted to reconnect it. The reconnection wizard recognises the hardware, and the VM's running on it, but it cannot connect, failing with the following error:
Unable to access the specified host. It either does not exist, the server software is not responding, or there is a network problem.
I'm guessing it's the middle option, since:
- I can ping the node's service console and vmkernel IP addresses from itself, the VCMS, and another machine
- I can connect to it directly with the VI Client
- The VCMS hosted on it talks to the other (3.5.0) node ok
- When I start the VI Client to connect to the 3.0.2 node, it calls up version 2.0 whereas connecting to the 3.5 node calls up version 2.5...
How do I upgrade the agent manually?
>>> On Thu, Feb 7, 2008 at 11:47 PM, in message
A new message was posted in the thread "ESX 3.0.2 "disconnected" after
upgrade to VC 2.5":
Author : fscked
As a follow-up, the problem ended up being as follows.
the host could resolve itself via fqdn and shortname. but with different ip's. One being the actual ip and one as the loopback. changinf localhost and localhost.localdomain to the loopback and the real hostname and fqdn to the ip resolved it all.
Apparently something to do with wanting to do something from the same ip to the same ip and using different forms of the same hostname results in a different IP being used and hence, cause it to fail.
I upgraded to VC2.5 this weekend and am now having this problem on one of my clusters.
I've manually installed the agent on the ESX servers, restarted services on ESX and rebooted the blades but still can't connect. The licence server is up and running. All I get is the "Connect failed:could not connect to host" error. I can ping, Putty and WinSCP onto all the blades so I'm not sure where the problem lies.
I've logged a call with Dell and they have recommended upgrading to ESX 3.5, but this is not something I want to do without proper planning.
Does anyone have a suggestion on when else I could try?