rminick
Contributor
Contributor

3.5 Hosts continually go into not responding in VC 2.5

I've opened a SR on this but I wondered if anyone else is fighting this problem.

Several VI 3.5 hosts in VC 2.5 keep going into 'not responding'. Grrrr..

If I restart vmware-vpxa on the ESX host, VC looks fine again for a while, then after 5 minutes or so this all repeats again. The CPU load on the hosts that this is happening on look ok to me. If I connect VC directly to these hosts, I see nothing wrong. The VC server isn't getting clobbered either. I've had a lot of weird VC issues since upgrading to 2.5 from 2.02. I'm thinking about nuking the VC DB & Server and do a fresh build but I rather not.

Richard J Minick, VCP
0 Kudos
22 Replies
rminick
Contributor
Contributor

After rebooting the VC server, no hosts are showing as not responding. After being up 3-5 minutes, the same 2 hosts show not responding again. It seems to be a VC problem.

Richard J Minick, VCP
0 Kudos
admin
Immortal
Immortal

When the host shows up as not responding can you check if vpxa is running on the host. Also check to vpxa logs (/var/log/vmware/vpx/vpxa*.log) for errors.

0 Kudos
ALK-_ABELLO
Enthusiast
Enthusiast

Hi rmnick

I have the exact same issues as you describe, so I'm very much interrrested in the outcome.. Smiley Happy

The 3.5 hosts seem to work just fine... but the VC drives me nuts with these disconnections. The VC, in my case, is a fresh install Smiley Sad

/Rubeck

0 Kudos
rminick
Contributor
Contributor

Yes, it's very frustrating. I've been tailing the log and watched things after I recycled vpxa. It looked okay in VC for 3 or 4 minutes and even did 2 vmotions before it went to not responding again. I don't see anything of interest in the host logs. The log shows the agents running on his cluster buddies. I'll probably just rebuild VC and it's DB but I'll hit support a little more before I do that.

Richard J Minick, VCP
0 Kudos
rmitchell9
Contributor
Contributor

I have also noticed the same issue.. sometime disconecting and reconnecting will work but then if I try to look at the networking confog or somthing else on the host it will goto not responding.. almost like there is "lag" from the host making it "not respond"

0 Kudos
rmitchell9
Contributor
Contributor

ok I have two hosts in a cluster.. I took one out of the cluster and created a new folder in my data center then add the host that I had removed (not in a cluster).. It is responding and seems to be fine..

virtual center client seems much quicker to load as well

I know its not right but I wanted to share that with you guys... so perhaps it has somthing to do withs hosts in a cluster .. Im thinking about shutting down all VMs and removing the second host then re creating the cluster with both hosts ... but I dont wanna be in trouble come monday morning... :smileycool:

0 Kudos
rmitchell9
Contributor
Contributor

I have removed the cluster completely and both hosta are responding just fine... I have not recreated the cluster yet... but I think I will...

0 Kudos
rmitchell9
Contributor
Contributor

ok I created the cluster and dragged hy hosts into the cluster ... waiting for failures

0 Kudos
rmitchell9
Contributor
Contributor

performed 1 migration...I have viewed host configuration and network configurations.. it take a few seconds for the initial dispaly switching from host to host but all hosts continue to respond .. much better than "not responding" hope this may work for you guys as well.... esx 3.5 and 2.5virtual center server

0 Kudos
rmitchell9
Contributor
Contributor

still working fine.. I have added and removed pnics from vswitches and VC server/client is responding very well, I have also migrated some vm's between the hosts.... so Im thinking Im all set..

0 Kudos
jc-rush
Contributor
Contributor

I've got an 11 node DRS/HA cluster and I'm running into the same problem. One node in the cluster keeps dropping out. It didn't do this right away, but over the last two days, it will drop out of VirtualCenter and the only way to get it back is to reboot the node. Once DRS moves VM's back onto it, it will drop in about 5 minutes. I put the host into maintenance mode last night so I could take a look at it today. It has not dropped out of VirtualCenter at all.

I'm looking at ESX Update 1 and it seems that there are two patches that could potentially fix this (ESX350-200802401-BG & ESX350-200803217-UG). Has anyone applied ESX Update 1 yet? Does ESX Update 1 seem to resolve this issue?

Any help would be greatly appreciated.

0 Kudos
rmitchell9
Contributor
Contributor

I havent had any problems since I deleted the cluster then recreated the cluster, back Feb. I think it was.. Im not installing the new patch/update right away, Im waiting to see how it works out for others .... seeing as Im not having any problems.

0 Kudos
jc-rush
Contributor
Contributor

Breaking an 11 node cluster with running machines isn't really an option in this case though... Smiley Sad

0 Kudos
rmitchell9
Contributor
Contributor

I understand, Just wondering did this happen after you updated to 3.5 and 2.5? or just out of the blue? Do you have a support contract with Vmware... any errors on the hosts console?

0 Kudos
jc-rush
Contributor
Contributor

This is all a fresh install of 3.5/2.5, no upgrades whatsoever. What logs do I need to look at? I can post them here if you like.

0 Kudos
rmitchell9
Contributor
Contributor

not sure about the logs...At one time I had some errors on the host console, We have a support plan with vmware and they were able to connect to the host via the web using web .. whatever it is ... mental block... they were very helpful... just a suggestion... Im not experienced enough to help you, guess I got lucky when I solve the problem I was having...

0 Kudos
jc-rush
Contributor
Contributor

No problem. I'll place a call to VMware and see what they say.

Thanks for your help though!

0 Kudos
jketron
Enthusiast
Enthusiast

Start with a clean database and you will not have that issue. I did VC upgrade and a fresh install and teh clients in VC server would state they were not responding then back again but if I point the VC client to the ESX host the VM's were actually fine. The problem was the database. I did a install with a clean database and it was fine.

0 Kudos
admin
Immortal
Immortal

Sounds like the VC agent (vpxa) on the host might be having problems. The logs are under /var/log/vmware/vpx on the host. Check for errors (grep for "error]") in the vpxa*.log files.

0 Kudos