We have been having our VC server randomly crash recently and I am curious if anyone has seen smiliar problems. We are running VC 2.5 Update 3 and I see the following errors in the log file:
An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Check database connectivity before restarting. Error: Error[VdbODBCError] (-1) "ODBC error: (HY000) - [ODBC][Ora]ORA-01483: invalid length for DATE or NUMBER bind variable
" is returned when executing SQL statement "UPDATE VPX_VM SET TOOLS_STATUS = ? , TOOLS_VERSION = ? , GUEST_OS = ? , GUEST_FAMILY = ? , GUEST_STATE = ? , DNS_NAME = ? , IP_ADDRESS = ? WHERE ID = ?"
As far as I can tell the database servers are not having any issues and connectivity should not be an issue.
Traceroute (tracert) or some other network diagnostics tools will determine this. Have your network team monitor the connection. The VC service is pretty resilient, it's not going to crash for no good reason, and the log clearly shows there is a problem with DB connectivity, so either monitor the DB connections from the DB side, or both network and DB.
Have you checked the SQL/Oracle Database logs, to see if maybe they reveal something? I would suspect a network issue (since it's complaining the DB was intermittent) before the VC service/software, at least in this case.
Thanks for the input.
I am having the DBAs take a look at the database side of things and see if they can reveal anything. I tend to not think it's a connectvitiy issue because we got an actual oracle error returned (ORA-01483: invalid length for DATE or NUMBER bind variable) rather than just some random error message about a bad return on the query.
A brief look online mentions issues surrounding Server Character Set which bares relevance to a DATE or NUMBER issue.
Can you find out from the DBA's or from your change control what change's have been made at the database end, if any?
Can you do a tnsping to the database server from your VC server?. TNSPING should come standard with the Oracle Client that will be installed on your VC server
It is clearly a data missmatch. VC server is getting an invalid lenght of data from either one or more VM's when it is querying for Virtual Machine information. This could be a bug with VirtualCenter. However, before raising flag, I would recommend you to check if all the VM's in your environment are installed with VMware Tools and updated to the latest version.
I dont thing there is anything wrong wity your database connectivity.
TNSPING and other such network connectivity don't have any issues. The problem is very internitent, as well. VC will stay running for days and then hiccup and die.
My DBAs have come up with similar questions regarding the Character Set as it would seem that impacts the number of bytes used to store data. They took a look at all the fields referenced in the failed update and said that all the fields except one are 255 charaters long. The only one that isn't is the IP address field which is 16 characters long. That should be plenty long enough. I haven't dug in yet to find the recommend character set from VMware. We currently have it set as follows:
I'll see what I can dig up on that. Any other insight out there is welcome. Thanks for the help.
Now there's something I hadn't really considered. We did recently update the VC box to Update 3 but the client VMs have not yet been updated with the latest tools. There could possibly be a version mismatch problem going on where bogus data is being passed back from the vmware tools.
Thanks for the idea. I will have to get all the tools updated and see if that helps the issue. I won't be able to get to that until next week, though.
This could be a bug with VirtualCenter. However, before raising flag, I would recommend you to check if all the VM's in your environment are installed with VMware Tools and updated to the latest version.
I dont thing there is anything wrong wity your database connectivity. Well why wouldn't this be a DB connectivity then? Bad data does not dictate a program error, programs should control the data, not the other way around. I don't care how far out of date VM tools are or how far the VM's are behind in version, VC should not disconnect because of an invalid query, period.
So this definitely points to connectivity, bad data should NOT drop the connection. It will ignore a null or unexpected data and move on, it shouldn't cause VC to simply disconnect. If VC crashes it wouldn't even log it (since if they didn't anticipated bad data, they certainly wouldn't handled exception errors either) and / or VC wouldn't restart. The fact that VC reports the Database is dropping concludes that the connection timed out (it's pretty forgiving) and there is something seriously wrong with the connection between VC server and DB. Which is yet ANOTHER reason I don't support running VC in a VM there are too many variables for this type of thing, connections are too inconsistent, but that's a forum for another day.
Bottom line is there IS a connection problem, data won't cause VC to drop it's connection VC handles the error and is saying exactly what is going on, loss of communication and query disruption.
This is ODBC, it's passing a parameter to the Oracle DB which it doesn't like, causing the Database (not VC) to disconnect the query, hence dropping the connection, so the query is dropping the connection.
ORA-01483: invalid length for DATE or NUMBER bind variable
Cause: A bind variable of type DATE or NUMBER is too long.
Action: Check your Oracle operating system-specific documentation for the maximum allowable lengt
Given that this is a known Oracle Query problem, I would be willing to bet your Oracle Database needs a patch. Or the ODBC driver on your VC needs to be updated, or both. Either way there are numerous forums about this specific error all over the net. There is a patch/update for this to fix it. That's the problem. It's still causing VC to lose connectivity (not VC fault, DB rejects the query, forces the arrest, and DB then drops connection, and VC even tells you the error code).
Forcing shutdown of VMware VirtualCenter now
The logs never said VC is loosing connection. VC is forced to shutdown, because it is getting unexpected data. Again this could be problem with Oracle or with DB CHARACTERSET configuration issue.
And the error is coming from an update statement from VPX_VM while trying to update the VMware Tools status. So it could be an issue with invalid lengh in the update statement.
UPDATE VPX_VM SET TOOLS_STATUS = ? , TOOLS_VERSION = ? , GUEST_OS = ? , GUEST_FAMILY = ? , GUEST_STATE = ? , DNS_NAME = ? , IP_ADDRESS = ? WHERE ID = ?
While I appreciate the input, I disgree with your conclusion. I'm not seeing any indication that the database connection has been dropped. What I see is a query passed to the database server, the database server giving a very clear answer that the data does not fit into the field (meaning bad data came FROM VC, not the other way around), and virtual center seeing the error reported back via ODBC as a reason to halt operations and shut down the service. LIkely a self preservation move. If we got an error message back from the database server (not bad/unexpected data.. but an actual error) we'd rather just shut down operatoins rather than risk corruption of data. I also see the fact that we recieved the error message as an indicator that the connection is still live. Otherwise we would not have received the error about the data not fitting into the field.
That being said. I'm not saying it couldn't be a problem on the database server side. There's always that possibility that we're hitting some bug there. But the database has not had any changes made to it for quite some time. We've been running VC since 1.0 and have not had this issue until the recent installation of Update 3. Of course I haven't been tracking it closely enough to definitevly say it's related to update 3, either. Which is why I was looking for other people's ideas.
Thanks again for the feedback.
I am having the same exact problem since updating to vc 2.5 update 3. Ours is dropping connections multiple times during the day. We are running SQL and not oracle. Have never had an issue until this update. Will be digging further..
I just restarted the service again this morning and have extended verbose logging running. Will post again after the next crash. I am thinking this may also be one of the latest MS Patches causing this as I am pretty sure this only happened over the last week and I believe that I installed VC Update 3 a few weeks ago.
I have seen different errors but this is from last night. It appears to be a ODBC error but I can't see where it actually went down:
An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Check database connectivity before restarting. Error: Error[VdbODBCError] (-1) "ODBC error: (HY000) - [SQL Native Client][SQL Server]The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions." is returned when executing SQL statement "UPDATE VPX_ENTITY SET NAME = ? , TYPE_ID = ? , PARENT_ID = ? WHERE ID = ?"
We are seeing the same issue since moving to VC update 3 - basically Virtual Centre crashing at least daily in both our datacentres;
The description for Event ID ( 1000 ) in Source ( VMware VirtualCenter Server ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: An unrecoverable problem has occurred, stopping the VMware VirtualCenter service. Check database connectivity before restarting. Error: Error[VdbODBCError] (-1) "ODBC error: (HY000) - [Microsoft][SQL Native Client][SQL Server]The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions." is returned when executing SQL statement "INSERT INTO VPX_IP_ADDRESS
We have had a call open with VMware and have a call open with Microsoft (as Vmware say its SQL related) but nothing has worked so far.
From the digging I did it appears the problem is having the statistics level on anything past 1. Since the daily stats rollup runs every 30 minutes it can't handle the amount of transactions that the advanced statistics levels create.
My SQL server had 18G's of ram and 4 cpu (DL585 G2) and still couldn't run the job without crashing. Since changing the statistics level back to "1" i have had no crashes. I had changed them when utilizing vKernel's capacity planner which required the statistics levels to be changed to level 2 or greater.
I am having the same issue. It seems to have started after I went to update 3. I have now since upgraded to update 4 and the problem is still there.
I am just guessing here but I have a large vmware environment in the range of about 40+ hosts and 900+ vms. It seems that the past day stats rollup is what is killing the VC. It is just about every second time that it runs that it fails and then there are times where I get lock errors and the whole thing comes tumbling down. I have been noticing that the VC is consuming memory and not letting it go. It starts around 100MB and over time it will grow to about 1.5GB and then the service will tank. Looks like some type of leak.... Not good!!
I currently have a call open with support but any help would be appreciated...