VMware Cloud Community
jmatos
Contributor
Contributor

VC 2.5 Stops Responding After a While

Hi,

my VC upgrade went well but now, after some days, it stops responding until i restart the service (which takes too long to stop). After this restart, VC works fine for a few minutes only.

The error message is "the request failed because the remote server took too long to respond".

Already restarted mgmt-vmware on the hosts and shrinked db as well (260 mb w/ SQL 2K5 Express)

Anyone went through this issue?

0 Kudos
49 Replies
Rod
Contributor
Contributor

With input from VMware Technical Support I have prepared a solution document for this problem.

http://communities.vmware.com/docs/DOC-3290

0 Kudos
scerazy
Enthusiast
Enthusiast

After 3+ months of SQLExpress working fine it decided to go to 100%

So I decided to give the above procedure a go.

"The down side of this action is that you will loose any

performance data that is currently stored in the DB"

Strange, every bit of the procedure finished fine, VC CPU utilization is back to normal, but none of the statistical data got destroyed!!

I can query day, week, month, year & the returned graph is complete (and returned fast)

Seb

0 Kudos
joboo12
Enthusiast
Enthusiast

Seb,

My experience was the same. I did not lose historical data. My concern is if this process needs to be repeated periodically? Can anyone comment on this?

Thanks

0 Kudos
scerazy
Enthusiast
Enthusiast

Makes me feel better! Thanks for info

It seems that the jobs that would normally be run by the agent (that does NOT exist in Express version) are not run

But they CAN be made run with third party utils ie

or

or

or

and read here:

http://communities.vmware.com/message/899995

Thanks

Seb

0 Kudos
almessias
Contributor
Contributor

Hi,

We made a fresh VC 2.5 installation in a brand new server. Then we upgrade our existing ESX 3.0.2 servers to ESX 3.5, then we add this upgraded servers to new VC 2.5. After that we install 05 brand new ESX 3.5 servers and add to VC 2.5 too. But now we have the "The request failed because the remote server took too long to respond" error many times in all ESX servers, even when need do a simple task link "Refresh Network Information"...

As additional problem, some times we take 05~10 minutes to logon at Virtual Center Client...

All that problems became only after migrating from VC 2.02/ESX 3.0.2 to VC 2.5/ESX 3.5 !!!!

Has anyone solve this problem???

Thanks a lot

0 Kudos
ddugan
Contributor
Contributor

Hi,

I'm new to the VM thing, but I've been testing it for a bit. We VC 2.5 installed on physical machine with a license server and SQL 2005. And I'm having the same problem. If you do anything to one of the ESX servers in a cluster you have a 30 second or so wait, then all seems fine until you try and do a task. As in networking. I was adding in a VMotion network and all goes fine until you hit "Finish". Then you wait, and it eventually comes back with the error "The request failed because the remote server took too long to respond", however, the action does go through.

Also, you have the same issue of moving from one ESX server in the cluster to another, about a 30 second wait before it refreshes. And you can't do anything with it, not even move the VC window around on the desktop, until it comes back.

I checked my CPU utilization, but it hardly blips. This server has two dual core processors and 16 GB of ram, so I'm sure it isn't the hardware. Not to mention, it was working just fine until I imported it into our AD forest. I can't picture that being the issue, but that is when it had a cow. I then I rebuilt the whole thing from scratch (I tried a few things before hand), deleting the old databases and creating new ones. Same results. I'm at a loss, at the moment. And this wasn't a migration, this was a fresh install!

Any thoughts would be great!

Thanks!

0 Kudos
Vincent_Kemp
Contributor
Contributor

We are experiencing similar problems since upgrading VC2.0.2/ESX3.0.2 to VC2.5/ESX3.5 Update 1. Actually, the deciding factor seems to be the upgrade of the ESX hosts to 3.5(1). An example; when I start adding VM networks to a virtual switch, the first addition will process immidiately, but additional VM networks will result in the "took too long to respond" error. After that, I wait a couple of minutes and try again. Again, the first addition will complete succesfully, but additional attempts will time out again.

My hunch right now is that it has to do with a change in the way the licensing process works. Our VC server is also the licence server. It is multihomed, one nic connects to a management network where the SQL server and Active Directory resides which VC uses (the VC server is also a member of that domain), the other nic connects to the network where the ESX hosts live. We can see in one of our firewalls that the upgraded hosts try to connect to the 'management' IP on tcp 27000 through the 'ESX' network everytime a time out occurs. Why the ESX hosts try to do this is a complete mystery, because we specify the license server explicitly by IP address, not hostname, to prevent lookup confusion due to the multihome aspect. The ESX hosts shouldn't even be familiar with the 'management' IP address.

We are now changing our setup by removing the multihoming to see if this will solve the issue.

0 Kudos
ddugan
Contributor
Contributor

That was our problem too, and yes it is the license server. You have to punch some holes through the Windows firewall to let it access the license server. 27000 and 27010. Once we did that, all my problems vanished. Well, at least with ESX for the moment.

Daren

0 Kudos
Vincent_Kemp
Contributor
Contributor

We moved the VC server to a management VLAN and removed all other network connections to make sure the server was not multihomed anymore. Changed the way routing and firewalling was performed between the VC server and the hosts. We were forced to reconnect all hosts because the IP address of the VC server changed. We then had to reboot each individual host because the gateway & DNS server address change did not seem to 'take' correctly. After rebooting, the hosts act normal again and management runs smoothly. Except for a single remaining 3.0.2 host which we will upgrade later this week, this one still gives us long delays and timeouts on management tasks.

Quite an exercise this combined VC and ESX update, causing more issues then we anticipated (a one day job extended into 3 days). Luckily we did not expercience any downtime on the VM's due to DRS and VMotion doing its intended job.

0 Kudos
A13x
Hot Shot
Hot Shot

I seem to be having exactly the same issue but i have disabled the firewall in our test environment to ensure things work however i am experiancing long delays and eventually time out from the ESX hosts. I have yet to add the hosts to the license server which is too on the same box as the VC. 3.5 esx seem to work fine its since upgrading to update 1 for both the esx and vc when the issues started to occur. Is there a way to change the port which is used by the license server/ lookup?

0 Kudos