VMware Cloud Community
LluisClem
Contributor
Contributor

VCenter down, again and again.

Hi, we're experiencing a problem on our Vcenter, we're using ESX 4.1 U1 .

The service goes down and Windows 2008 tells :

"The VMware VirtualCenter Server service terminated unexpectedly.  It has done this 1 time(s).  The following corrective action will be taken in 300000 milliseconds: Restart the service."

Searching the KB we found good advices and this is what we've already done:

-Compact the SQL Database, and change transaction logging mode.

-Search for vms with snapshots unstable or more than 20 ( none of them were found)

-Reinstall vcenter and import old settings.

-Reinstall vcenter clean install and move ESX nodes.

-Update vcenter to 4.1 U2.

-Disconnect cd/dvd drives on vmachines.

None of those seem to work.

If we make a esx management services restart :

service mgmt-vmware restart
service vmware-vpxa restart

Some times the server is up for a week, some times for an our, but once service goes down it won't come up unless we restart management services.

I've tried to restart services esx per esx, to find a "broken" ESX, but each time a different ESX  may cause the failure.

Last recovery caused vcenter license to get missed, so we have the license but the vcenter entered in evaluation mode, even though we reassing the license, once we refresh, it returns to evaluation mode. The license key was entered on instalation procedure and was working till yesterday.

So after 4 vcenter reinstall and lots of test dones, i need some help.

vcenter logs during last night failure attached.

Regards.

Lluis

Tags (3)
Reply
0 Kudos
6 Replies
Shakaal
Hot Shot
Hot Shot

please provide the complete log set of virtual Center, i was checking the log but it doesn't give more information it only says :

"The underlying connection was closed: A connection that was expected to be kept alive was closed by the server"

would request you to provide the complete vCenter logs.

thanks & Regards

LluisClem
Contributor
Contributor

Sorry vcenter log is too big, I've opened a case I'll try to report as soon as It's solved.

Regards.

Reply
0 Kudos
marcelo_soares
Champion
Champion

Paste here only the vpxd.log file, ocated under c:\users\application data\vmware\virtual center server\logs\

Marcelo Soares
LluisClem
Contributor
Contributor

I'm getting errors in the ADAM .

We have reinstaled the server so ADAM should be new, but it doesn´t solve the problem.

VMwareVCMSDS (2192) ADAMDSA:
The database page read from the file
"C:\Program Files\VMware\Infrastructure\VirtualCenter Server\VMwareVCMSDS\adamntds.dit"
at offset 303104 (0x000000000004a000) (database page 36 (0x24)) for 8192 (0x00002000)
bytes failed verification due to a page checksum mismatch. 
The expected checksum was -370989682832698270 (0xfad9fad95b6c1462) and the actual checksum was
-294147856004283294 (0xfbeafa155b6c0462).  The read operation will fail with error -1018 (0xfffffc06).
If this condition persists then please restore the database from a previous backup.
This problem is likely due to faulty hardware.
Please contact your hardware vendor for further assistance diagnosing the problem.

Main issues on the vcenter :

[2012-02-06 13:42:15,933 Thread-25  ERROR com.vmware.vim.health.impl.ComponentSpec] Unable to retrieve health for 8C6E2394-C586-4BBB-B116-8F3E87C02163.visvc from https://vcenter:8443/vws/Query/Health
[2012-02-06 13:42:15,933 Thread-25  ERROR com.vmware.vim.health.impl.ComponentSpec] Unable to retrieve health for 8C6E2394-C586-4BBB-B116-8F3E87C02163.visvc from any of its health URLs

org.xml.sax.SAXParseException: An invalid XML character (Unicode: 0x0) was found in the element content of the document.
    at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
    at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at com.vmware.vim.health.impl.XmlUtil.getDocumentFromStream(XmlUtil.java:101)
    at com.vmware.vim.health.impl.ComponentSpec.retrieveHealthFromUrl(ComponentSpec.java:324)
    at com.vmware.vim.health.impl.ComponentSpec.retrieveHealth(ComponentSpec.java:267)
    at com.vmware.vim.health.impl.HealthPollerImpl.retrieveHealthFromUrl(HealthPollerImpl.java:116)
    at com.vmware.vim.health.impl.HealthPollerImpl.retrieveHealth(HealthPollerImpl.java:103)
    at com.vmware.vim.health.impl.HealthPollerImpl.computeHealth(HealthPollerImpl.java:185)
    at com.vmware.vim.health.impl.HealthPollerImpl.retrieveHealth(HealthPollerImpl.java:101)
    at com.vmware.vim.health.impl.HealthPollerImpl.pollHealth(HealthPollerImpl.java:84)
    at com.vmware.vim.health.impl.HealthPollerImpl.access$100(HealthPollerImpl.java:28)
    at com.vmware.vim.health.impl.HealthPollerImpl$PollerThread.run(HealthPollerImpl.java:54)
    at java.lang.Thread.run(Thread.java:619)

javax.naming.NamingException: [LDAP: error code 1 - 000020EF: SvcErr: DSID-020A072F, problem 5012 (DIR_ERROR), data -1018
]; remaining name ''
at com.sun.jndi.ldap.LdapCtx.mapErrorCode(LdapCtx.java:3081)
    at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2987)
    at com.sun.jndi.ldap.LdapCtx.processReturnCode(LdapCtx.java:2794)
    at com.sun.jndi.ldap.LdapCtx.searchAux(LdapCtx.java:1826)
    at com.sun.jndi.ldap.LdapCtx.c_search(LdapCtx.java:1749)
    at com.sun.jndi.toolkit.ctx.ComponentDirContext.p_search(ComponentDirContext.java:368)
    at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:338)
    at com.sun.jndi.toolkit.ctx.PartialCompositeDirContext.search(PartialCompositeDirContext.java:321)
    at com.vmware.vim.jointool.ADAMInstanceManager.purgeConflictEntries(ADAMInstanceManager.java:794)
    at com.vmware.vim.jointool.util.ldaphealth.LdapHealthMonitor$LDAPDataPurgeConflictsTask.purgeConflicts(LdapHealthMonitor.java:595)
    at com.vmware.vim.jointool.util.ldaphealth.LdapHealthMonitor$LDAPDataPurgeConflictsTask.run(LdapHealthMonitor.java:571)
    at com.vmware.vim.jointool.util.ldaphealth.LdapHealthMonitor.run(LdapHealthMonitor.java:182)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)

Reply
0 Kudos
LluisClem
Contributor
Contributor

I found this article in de KB, that explains how to make a backup of the ADAM

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102986...

When I tried to backup the Adam it failed, so I've just uninstalled (once more) vcenter, deleted all tomcat and vcenter files kept local , and now it seems to be working.

I'm able to make backups and vcenter Is up and running.

Reply
0 Kudos
jmouracade
Contributor
Contributor

From your attachment:

Online defragmentation of database 'C:\Program Files\VMware\Infrastructure\VirtualCenter Server\VMwareVCMSDS\adamntds.dit' terminated prematurely after encountering unexpected error -1018.

From the log you posted:

The expected checksum was -370989682832698270 (0xfad9fad95b6c1462) and the actual checksum was
-294147856004283294 (0xfbeafa155b6c0462).  The read operation will fail with error -1018 (0xfffffc06).

Don't look any further than this. You're not having a vmware/vcenter issue per se. The ADAM database is corrupted, and in an unrepairable state. If, when you reinstall from scratch, you're reusing the same database, you'll keep running into issues.

On the other hand, if when you reinstall from scratch, the ADAM db gets recreated and you're *still* getting corruption everytime, you have an underlying hardware issue. (Ask any Exchange admin what they feel when they see "Error -1018" in their event log Smiley Happy)

Reply
0 Kudos