VMware Cloud Community
jme198
Contributor
Contributor

VI SQL database maintenance job failing

On my VI server I am using SQL installed locally for the database. There is a management job that runs, stats_rollup1_proc.sql every 1/2 hour and periodically it fails. From what I understand the stats_rollup1_proc.sql script is provided from VMWare. I see the following warning in the Event Viewer:

SQL Server Scheduled Job 'Past Day stats rollupVirutalCenter' (0x0B676FB115572C40B74C822D1424FC3A) - Status: Failed - Invoked on: 2009-12-22 14:00:00 - Message: The job failed. The Job was invoked by Schedule 4 (30 min schedule). The last step to run was step 1 (Past day stats rollup).

I am wondering if the job sometimes runs longer than a half hour to complete and then shows up as failed because another instance of the job begins to run? Would there be any harm in running the job every 45 minutes or hour?

0 Kudos
11 Replies
lorio
Contributor
Contributor

I have the same error message, I don't understand either... Why is it only failing "sometimes" and not everytime this job is run... Smiley Sad

My details : vCenter 2.5 U5 - SQL 2000. Win 2003R2 32bit.

0 Kudos
Wimo
Hot Shot
Hot Shot

We have this issue too. Opened a case, VM support had us purge all the performance stats - didn't fix it. One of our DBAs looked into it and determined it was a "deadlock" in DBA-speak, but he could not pinpoint the exact problem. Then it sort of fell off the radar screen. At this point I guess we'll make a New Years resolution to try to get it resolved once and for all.

Please post if you get it figured out!

0 Kudos
jme198
Contributor
Contributor

Wimo, how often is your SQL database scheduled to run the " daily stats rollup" script? It is currently running every 30 mintues here and I am wondering if the script sometimes takes too long to complete and then the next scheduled script runs and fails with an error message because the previous job is still running. I spoke with a DBA and he mentioned that a "deadlock" referred to a locked process, which further supports this theory.

0 Kudos
Wimo
Hot Shot
Hot Shot

It runs every 30 minutes like yours - I don't that's configurable. Under VC Mgmt Server Configuration>Statistics, you can check or uncheck, or change the Level. Not sure what is lost by unchecking the 30 minute one and waiting for the 2 hour one.

Funny thing is it does not happen every time. Today for instance, we get that event at 1, 2, 3, 4, 6, and 7:30 AM and not since (now 10 AM in my time zone).

0 Kudos
jme198
Contributor
Contributor

I am referring to an SQL script that runs against the database. This is all taking place on my Virtual Infrastruture Server. We changed the schedule to run every 45 minutes in the Enterprise Manager for SQL. We want to see if that will give the job enough time to finish before the next job needs to run. The SQL DB who helped me looked at the script that runs and noticed the script locks tables as it runs,so that is why the job fails at times. The script is trying to lock the same table twice and then it fails.

0 Kudos
Wimo
Hot Shot
Hot Shot

Are you (or your DBA) saying that this: SQL Server Scheduled Job 'Past Day stats rollupVC' is not the same thing as the Satistics thing that is invoked through Virtual Center?

jme198
Contributor
Contributor

I think the data is the same. I think that the setting in VI effects what the script does in Enterprise Manager.

0 Kudos
Frank_Poelert
Contributor
Contributor

I am having the same problems. I checked the SQL activity monitor and found out that there are several stored procedures trying to lock the same tables.

vCenter triggers the procedure 'load_stats', which fills the STAT1 tables with data.

The past day stats rollup reads this data every 30 minutes and consolidates it in the STAT2 tables. It locks the table while reading (why?).

Now when these procedure are active at the same time, one of them is suspended. Most of the time that will be load_stats_proc, because there will always be a rollup active or suspended. The next time this load procedure is triggered by vCenter another process is created, blocked by the first load_stats and itself blocking the stats_rollup1_proc. This is what you call a deadlock.

Disabling the statistics in vCenter means the load_stats_proc is not triggered, and the rollup jobs run succesfully.

The interval settings and keep time determine which data is transferred to the next level of consolidation by the rollup.

Funny how VMware always manages to improve on functionality that was working fine in a previous version Smiley Sad

I logged a support request for this as well.

0 Kudos
Dave_Mishchenko
Immortal
Immortal

A lock will be taken for a process to ensure that it gets a clean read of the data (i.e. the data is not being changed while the read is occuring). You can modify the code to specify a lock type, but that would be a last resort.

Do you have a maintenance job running regularly to update indexes and statistics? Also have you been able to narrow down the problem read operation? You might fix the problem query with an index.




Dave

VMware Communities User Moderator

Now available - vSphere Quick Start Guide

Do you have a system or PCI card working with VMDirectPath? Submit your specs to the Unofficial VMDirectPath HCL.

0 Kudos
Frank_Poelert
Contributor
Contributor

Hi Dave,

Thanks for the update.

Statistics are being updated, but not the indexes. I will try that too. I don't yet know which read is taking so long that it gets blocked. The Activity Monitor doesn't show that much info.

Regards,

Frank

0 Kudos
timmp
Enthusiast
Enthusiast

I am wondering if anybody has figured this out. We used to see this issue last year in vCenter 2.5 and thought when we started from scratch with a new 4.0 vcenter server and new DBs, these issues stopped. I have come to find out that these deadlocks are still randomly happening. The job for the stats rollup will run fine and then may randomly error out once in a day. Our issue is that we have alarms generated through monitoring software on our SQL servers and it is sends critical alerts via alarmpoint

I am going to open a case with VMware but have read many KBs without a solution. Our environment has a vCenter server running as a VM (vcenter v 4.0) residing on ESX 4.0 hosts connecting via ODBC to a clustered set of SQL 2005 servers running as physicals.

Any thoughts on this random deadlock?

0 Kudos