For the past 9 months our ESX 4.0 and ESXi 4.0 hosts continue to disconnect from vCenter server about every 2 weeks. I initially thought this was due to the amount of traffic over the management vmnics. There was previously a single vSwitch with two vmnics that passed traffic for management, iSCSI, NFS and vMotion. During a recent outage the only way I could get the hosts to come back online was to disconnect the vmk interfaces disabling any and all iSCSI, NFS, and vMotion traffic.
The most recent outage had none of these factors in place. I had a single vSwitch with a vMotion portgroup mapped to two vmnics and a management portgroup mapped to it's dedicated vmnic with one of the vMotion vmnics as a standby for backup. Even with this setup all three of our hosts eventually went down again, requiring a reboot to fix. We're now focusing on vCenter and whether or not the vpxa packages on it are corrupt. Before this was all going on we attempted to upgrade vCenter 4.0 to 4.1 and had to revert back to a snapshot due to complications; the local SQL Express was running the database. Since this upgrade attempt and snapshot revert is when these issues began to occur.
Our plans now are to do a clean install of vCenter 4.1 to replace our vCenter 4.0 and at the same time create a new database. Does this seem too extreme as a fix? In my mind the vpxa packages local to our current vCenter server may be corrupt because when we try to reconnect vCenter to the ESX/ESXi hosts is seems to have an issue after I supply a username and password to the hosts. I'm assuming this is the point when vCenter attempts to install the vpxa package and fails. Would appreciate a sanity check on all this.
You are using the SQL Express Bundled DB? How many hosts are you managing? What is the size of your VCDB? 4GB is the maximum for the Express version. What is the uptime of your vCenter Host OS? We typically like to reboot ours every 30 days, just because it's Windows.
Starting over is always an option, but if you are, you may think about using a Standard or Enterprise version of SQL.
When we initially attempted the vCenter upgrade about 9 months ago we were using the SQL Express Bundled DB. When the upgrade to vCenter 4.1 failed and we reverted back to vCenter 4.0 we had issues starting the vCenter service. It was because the database had grown too large so we migrated it to an SQL 2008 R2 database which is currently 6.5 GB in size. We are currently managing six hosts: 5 ESX and 1 ESXi. We just had an outage so our uptime is only 1 day but we typically do not reboot the vCenter server. Sounds like a good idea to start. Does it sound like the corrupt vpxa package on vCenter may have some validity to it?