Skip navigation
2015

Would like to share about an incident we faced with VCenter service start up issue & how it was resolved.


Last week there was VCenter service start up problem with my colleague's environment after it was upgraded to version 5.5 few days prior to the issue. While we start the VCenter service it was getting to a starting state and after few minutes to stopped state. No dmp file generated  by vpxd.exe crash, we carried out normal process, procedures step by step to troubleshoot the issue. By analyzing vpxd log, we confirmed that service is getting started initially and getting through the DB authentication as well. But finally getting terminated saying some directories could not be accessed  & we tried all possible ways and lastly we decided to re install VCenter server and map it to the existing SQL DB.

 

Before we proceeded, while trying out Google came to see that we can able to set the WER ( Windows Error reporting feature ) where it gets the application services crash dumps in general. By design or Luck  we had this feature enabled already in our VCenter server. Refer the LINK for complete configuration of WER ( https://msdn.microsoft.com/en-us/library/windows/desktop/bb787181(v=vs.85).aspx )

 

And as per the article we had a vpxd.dmp dump file in the configured location %LOCALAPPDATA%\CrashDumps.

On check the dumps, the logs matched with the VMWare KB http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2065630

 

The cause for the issue it has been said VCenter Server performs a validation on its Inventory data fails due to the excessive snapshots & as a workaround to change the <ThreadStackKb> size to 1024 in the vpxd.cfg file under thread pool section.

 

But we were not really sure if that is the case with the environment though we went and did the change as per the KB and it did the trick. We had 512Kb by default.

After it got fixed we carried out a check throughout the environment and found lots of VM’s running with snapshots & that is due to poor design of Roles & Permissions planning where users having full permissions on the VM’s doing all these nasty things. Finally environment was revisited with respect to roles and permissions & got the access restrictions applied.


Regards,

Arvinth Rajkumar Ravi.