VMware Cloud Community
jmatos
Contributor
Contributor

VC 2.5 Stops Responding After a While

Hi,

my VC upgrade went well but now, after some days, it stops responding until i restart the service (which takes too long to stop). After this restart, VC works fine for a few minutes only.

The error message is "the request failed because the remote server took too long to respond".

Already restarted mgmt-vmware on the hosts and shrinked db as well (260 mb w/ SQL 2K5 Express)

Anyone went through this issue?

0 Kudos
49 Replies
scerazy
Enthusiast
Enthusiast

I do not know why & how but my move from MSDE with VC 2.0.2 to SQL Express 2005 with VC 2.5 gives me the whole view of the past (week, month, custom etc)

Am I lucky or what?

Seb

0 Kudos
cryptonym
Enthusiast
Enthusiast

Hairyman,

The stated problem should not exist on SQL2005 Standard, only SQLExpress. SQLAgent is a part of SQL2005 Standard, but was removed by MS in SQLExpress. I'd anticipate no issue within SQL2005...but it's not free either.

0 Kudos
cryptonym
Enthusiast
Enthusiast

Very curious... can you tell me how long ago you upgraded? The reason I ask is that my setup appeared operational for a while. In reality, since I had preserved my performance data, it was showing the old data, but it was not collecting any new data in the month and week views. After a while this manifested itself with the old data moving out of scope, hence the "no data" type of error. According to the tech I spoke with, it should not be collecting data, and the SQL scripts seem to confirm this as they skip over the rollup code if the version is 'Express Edition'.

Warren

0 Kudos
scerazy
Enthusiast
Enthusiast

Just under 3 weeks now.

For every value (CPU, memory, disk, network etc) I do get data returned for a DAY, WEEK, MONTH, YEAR

Data seems to be consistent & correct

How I done it

Used MSDE on VC server, stopped VC service, detachced DB from MSDE, stopped & disabled MSDE services inc agent, installed SQL Ex 2005 on same VC server, attached DB to SQL Ex 2005, changed system DNS ODBC, upgraded VC to 2.5 poiting to existing DB (now on SQL Ex 2005)

Seb

0 Kudos
akmolloy
Enthusiast
Enthusiast

I'll add a "Me too" to this post.

I upgraded to VC 2.5, and in the process upgraded from MSDE to Express. I'm having connection problems to the VIC, loss of historical performance data, and high CPU spikes that take up all of one processor with the sqlservr.exe process.

I don't currently have a 2005 Standard installation to connect to, so I need ot use a local DB. I remember reading this is supported in production on up to 5 (?) ESX servers. We have 3.

Is the only fix at this point to reinstall VCMS from scratch? Is there any way of keeping historical data at this point? What features do I lose from when I had MSDE and now?

-Tony

0 Kudos
scerazy
Enthusiast
Enthusiast

I really have no explanation, as I am not DB guru (I can get by administering it a bit)

Your post reads: "...process upgraded from MSDE to Express..."

In my case I MANUALLY upgraded the DB BEFORE upgrading the VC

Maybe this made a difference?

Seb

0 Kudos
jhanekom
Virtuoso
Virtuoso

Some said VMware tech support stated "it doesn't work 'cause we now do statistics rollup with SQL Agent, not VirtualCenter." Firstly, that doesn't gel with the fact that many of us are seeing flatlining SQL CPU utilisation on upgraded databases. I've disected the SQL queries being run, and it once again appears to be in (drum roll, please) the statistics rollup procedures.

These were a mess in VC 2.0 RTM and they certainly appear to be a mess again now, at the very least for upgraded databases.

Granted, they've evolved enormously since the original VC 2.0 RTM and are insanely complex to get right, but given the problems with them in VC 2 I'd have hoped QA would be better this time around.

SQL Agent isn't the only way to run tasks on a regular basis. And if it was the case that statistics rollup only happened with SQL Agent, why am I seeing 100% CPU utilisation on my SQL Express Server in the rollup procedures? Either this is a bug, or someone is lying, or both.

(Regarding support for MSDE vs Express in production environments: granted, but I cannot justify spending several hundred just for a full SQL license in my lab environment. Also think about smaller deployments, such as we're likely to see now that VC Foundation has been announced.)

0 Kudos
Dave_Mishchenko
Immortal
Immortal

> SQL Agent isn't the only way to run tasks on a regular basis. And if it was the case that statistics rollup only happened with SQL Agent, why am I seeing 100% CPU utilisation on my SQL Express Server in the rollup procedures? Either this is a bug, or someone is lying, or both.

Have you had a chance to run a SQL trace to see how the update querries are processing? It might be the case that you need to update indexes / statistics or add indexes to speed things up. Do you recall having an optimization job setup when you had MSDE?

0 Kudos
jhanekom
Virtuoso
Virtuoso

Yes, I have run a trace. But it's proven difficult to track, since the transaction that they're running to sum the data is quite large and ends up being hidden amongst many other transactions. Maybe I've just not set up the correct filters yet.

Anyway, it appears that it gets stuck in either load_stats_proc or stats_rollup1_proc (or both). Both contain statements that match what I'm seeing on SQL 2005's activity monitor and SQL Profiler.

I didn't use MSDE before - I've only used SQL Express in my installation. I also have a job (using ExpressMaint) that rebuilds the indexes on a weekly basis, so I'm confident that's not the problem.

0 Kudos
ante
Contributor
Contributor

We've had the exact same problem with upgrading from 2.0 to 2.5. The MSDE worked fine, went to SQL Express and everything tipped over... Now we've moved to a separate SQL 2000 SP4 machine (soon to be upgraded to SQL2005, I do anticipate some problems then as well, since there are different ODBC drivers in 2000/2005...) and everything is dandy for now...

/A

0 Kudos
formulator
Enthusiast
Enthusiast

Same problems here. I upgraded two virtual center servers the same way. One is fine the other is not with regards to CPU utilization. For the historical data both servers show only real-time for some things and the full daily, weekly, etc. for others. I also have an issue with some users who have restricted permissions getting crazy errors in the client for no apparent reason. These are problems that i think should have been caught or resolved earlier. Unfortunately I don't have time to run around chasing the problem down and making support tickets that will lead me in circles so I'm going to reinstall VC on that server I'm having trouble with today or maybe just go back to 2.0.2.

I had to do a SAN switch upgrade over the weekend that involved a lot of vmotioning and powering off/powering on guests and because of the problem with CPU utilization I had to restart the VC service and SQL service about 10 to 15 times. It was VERY annoying.

0 Kudos
cryptonym
Enthusiast
Enthusiast

From reading all the posts here, there are two different behaviors, some with performance freezes, others with feature loss. Since I only reported a loss of data, and am seeing CPU typically < 10%, the vm techs came back to me with the "expected behavior" excuse, but it certainly sounds like a lot more complex problem then they've let on to me. This just gets stranger by the minute.

Hopefully my problem will be fixed soon, as the DBA servicing my area returns to work later this week. I hope to be on SQL2005 Server by the end of the week, which by accounts, fixes all this.

Thanks for all the info

Warren

0 Kudos
paulkbeyer
Contributor
Contributor

As requested by support, I've created a new DB in SQL Express 2005 told VC to setup in that DB, use it and joined all the hosts to the new VC DB instance. So far - acceptable CPU utilisation on the VC server (5-10% average as opposed to the 100% that the SQL service was consuming) and no problems with responding and client crashes. Will update when I get more response from support tomorrow.

0 Kudos
akmolloy
Enthusiast
Enthusiast

Hi paulkbeyer, if you get a chance, can you post detailed steps on what you did? I'd love to save time and not call support. I'm assuming you lost historical performance data?

-Tony

0 Kudos
CCJNL
Enthusiast
Enthusiast

We experience the SAME issue with our VirtualCenter server after upgrading to 2.5.

We original used MSDE. Before upgrading/installed VC.25 we upgraded MSDE to SQL 2005 Express. All went smooth and so did the upgrade to VC2.5.

However in the following days we could not login, and when we could login in we could not stay logged in. We got all sorts of time out and server busy errors.

We opend up a ticket with VMware support (via IBM) and they told us we had DB issues. We opted for the quick fix route which was to uninstall VC 2.5 uninstall SQL2005 Express and do the install of VC 2.5 over from scratch.

It worked like a charm and we have been smooth sailing ever since. It only took us about 20 min to install VC and configure it exactly like it was before.

0 Kudos
jobo
Contributor
Contributor

After upgrade to vc 2.5 our VC server or more correctly, the sqlserver express started to "hog" the CPU frequently.

I opend the VC database with sqlexpress management studio and manually executed the stored procedures. dbo.purge_stat and dbo.stats_rollup (acually there are three of each)

I also had to increase the maximum size on the transaction log because it hit the roof. (in mycase it had a limit of 500M)

Now the VC is running fine, without all the fuzz about reinstall. I guess that the problem will show up again in the future because SQL express doen't include the SQL agent which is infact doing what I just did manually, or am I wrong here?

Maybe you could use some 3:rd party software (scheduler) like to add the missing SQL agent feature.

Is there anyone who tried this?

0 Kudos
slartimitvar
Contributor
Contributor

I have a VC 2.5 and SQL2005 Express installation which is NOT an upgrade from previous versions. I still see the same issue as others here are seeing. Every 30 minutes on the dot, sqlservr.exe hits 99% cpu for a period of time and then returns to normal for the remainder of the 30 minutes. The 99% cpu period of time is slowly getting longer and longer (currently 20 min).

I followed jobo's advice and had a look at the stored procedures. The rollup procs are documented as being used to condense (1) 5 min interval stat data to 30 min interval stat data, (2) 30 min to 2 hr and (3) 2 hr to daily. The purge_stat procs are used to (1) purge all 5 min data older than 24 hours, (2) all 30 min data older than a week, and (3) all 2 hourly data older than a month plus all daily data older than a year.

So I think my 99% cpu usage periods are resulting from the automatic running of stats_rollup1_proc by SQL express even without any SQL agent.

All the purge_stat procs executed instantly for me and reported they had affected one row.?? I have about 5 weeks worth of data so the first two should have affected more than one row.

All 3 rollups also execute instantly and report that they had also affected one row. My VPX_HIST_STAT1 table still has 750000 records and continues to grow.

I don't know what to do next other than write it up and log it as a support request.

0 Kudos
ksram
VMware Employee
VMware Employee

The VirtualCenter not responding may be due to the fact that the SQL Server process is eating up the CPU.

The root cause is that the database upgrade is not compelte.

A temporary workaround is to stop collecitng performance statistics

A possible workaround for the issue (without installing a fresh

database) is here. Down side is that you will loose any exisitng

perormance data.(There is a way to retain them but tedious)

#Truncate the table VPX_HIST_STAT1

#Execute all the sql commands (one by one) in the script

cleanup_upgrade_mssql.sql on the VC database. (Default location

C:\Program Files\VMware\Infrastructure\VirtualCenter

Server\dbupgrade\Upgrade-v3-to-v4\T-SQL)

You should be able to get the performance statistics working again without any CPU hit

0 Kudos
slartimitvar
Contributor
Contributor

Yep, my SQL service was eating up all the cpu. In my case the problem wasn't from an upgraded database completeness issue as this is a fresh install of VC 2.5 on a fresh install of SQL 2005 express.

I've been lucky in that I went back and followed Jobo's advice again in case I did something sleepily the first time and the stats_rollup1 procedure took half an hour to run this time. I restarted the VC service and everything has been running very smoothly since then. There are only 24000 records in my VPX_HIST_STAT1 table now!

Attached is an image of my VCMS performance history before and after this successful intervention.

0 Kudos
joboo12
Enthusiast
Enthusiast

I also have a fresh install w/SQL Express and experiencing the same issues. I followed Jobo's instructions and have been running solid since. The chart below shows the gradual increase in CPU usage over a period of a month until failure.

0 Kudos