VMware Cloud Community
bwatkins
Contributor
Contributor

Upgraded 4 host Cluster to ESX 3.5.0 and now 2 hosts have HA/DRS error

Hi

I have recently upgraded my 4 host ESX 3.0.2 server farm to the ESX 3.5.0 release. The hosts are DL585G2 Dual core quads with 64Gb in each host.

The error that I have on 2 hosts is

"Unable to apply DRS resource settings on host (Reason: A general system error occured: Invalid fault). This can significantly reduce the effectiveness of DRS"

I have tried created a new HA cluster and moving the hosts into it, but the same error still occurs. What logs should I look at to try and identify the cause of this issue?

Regards

Brett

Reply
0 Kudos
16 Replies
tlowe
Contributor
Contributor

I ran into this problem with 2 of my servers I ended up just reloading them. It was alot quicker than trying to fix it.

Reply
0 Kudos
zemotard
Hot Shot
Hot Shot

In this case, you can also disable HA, wait few minutes and enable it again.

Normally it solves this issue.

Regards

Best Regards If this information is useful for you, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
wwolkers
Contributor
Contributor

have you checked the HA logs in /opt/LGTOaam512/log ?

perhaps there is some info in there?

(I'm not sure about the path btw, this is from ESX3.0, maybe there is a newer version running on the 3.5 hosts, haven't been able to install one yet myself.)

Reply
0 Kudos
Oakland
Contributor
Contributor

It took just under an hour for my 3 hosts to configure HA. This included me fooling around with disabling and enabling it (perhaps unneeded). It was slow to load though.

Reply
0 Kudos
WINTEC
Contributor
Contributor

HI,

I also have this error, have reconfigured HA and enabled / disabled several times but have been unable to resolve. I do not want to have to rebuild ESX 3.5 Hosts if I can avoid it, and the HA/DRS problems started about 2 weeks after it was initially enabled.

I don't believe rebuilding is the answer. Also, I cannot locate those logs as mentioned, can someone please post the location of those logs under 3.5?

Anyone have any answers?

Cheers

Reply
0 Kudos
NorbK
Enthusiast
Enthusiast

Getting similar messages here except I have "Insufficient CPU resources" for one error and "the request refers to an object that no longer exists or has never existed". Add that to crazy DRS spazzing when DRS migrations occur (that was before these messages started appearing) and the head starts spinning. Up to now I've never had an issue upgrading ESX but this 3.0.2 to 3.5.0 has completely changed how I'll look at upgrading in the future (like maybe 6 months after RTM!)
I'm curious as to how you all upgraded. Was it from the tar file or CD? Mine was with the tar file.
Reply
0 Kudos
wanstor
Contributor
Contributor

I've also had the same problem.

I found putting the host into maintenance mode (ie migrating all guests off) and then removing the host from the cluster (destroy host) and re-adding it again fixed my probs.

Reply
0 Kudos
gmaze
Contributor
Contributor

I had the same issue. I restarted Vcenter service and also on the host having the issue I restarted the mgmt console service mgmt-vmware restart Thisresolved it for me.

Reply
0 Kudos
AEZSIT
Contributor
Contributor

I had the same problem, but the fixes here did not work. What did work though was to put all the new patches on the server, and rebooting it. It is all working fine now. There were about 9 patches that were needed after installing 3.5

Reply
0 Kudos
spinworld
Contributor
Contributor

Had the same issues, disable/re-enable HA/DRS 3x times with no luck. Went thru some of our VM and notice some of them never unmount Tools Installer. Unmount VMWare Tools and DRS is back... Hope this helps!

Reply
0 Kudos
TaFfin
Contributor
Contributor

Had the same issues, disable/re-enable HA/DRS 3x times with no luck. Went thru some of our VM and notice some of them never unmount Tools Installer. Unmount VMWare Tools and DRS is back... Hope this helps!

That did the trick for me also. One VM was still installing the tools. That caused DRS to fail on one of our ESX server.

Michael.

Reply
0 Kudos
deepu_cherian
Contributor
Contributor

Unmounting the stuck vmware tools installer on one of the VM resolve my issue. If you see that the Vmware tools installation still running, select, End vmware tools install". This will resolve the DRS issue/error.

Note: I was able to remove this error by disabling the DRS ( where by removing the resource pool), but as soon as I recreate the resource pool the error came back.

Unmounting the tools installer fix my issue completly and allowed me to retain my DRS setup and resource pools.

Deepu

Reply
0 Kudos
jack8139
Contributor
Contributor

I tried all the steps mentioned line restarting the VPXA agent service. management service, readding ESX hosts into the cluster, checking, unchecking the HADRS cluster settings. However, my issue got resolved after I restarted the Virtual Center service. Looks like it had some bad cache and restarting the service emptied the cache. :smileyblush:

Reply
0 Kudos
sasar
Contributor
Contributor

Unmount VMWare Tools helped me also. Reconfiguring DRS and HA did not.

Reply
0 Kudos
mnasir
Enthusiast
Enthusiast

I got the same error after upgrading my esx server 3.5U4, please use the command below to resolve the issue:

1. Disconnect your host from the cluster and create an SSH session to that host

2. use su - to become root

rpm -qa | egrep -i '(vpx|lgto|aam)'; ls /tmp/vmware-root

rpm -e `rpm -qa | egrep -i '(vpx|lgto|aam)'`; rm -rf

/etc/vmware/hostd/pools.xml; rm -rf /etc/vmware/license.cfg; userdel

vpxuser;service mgmt-vmware restart

tail -f /var/log/vmware/hostd.log | grep BEGIN

Wait for the BEGIN to show up on your console

Reconnect the host to the cluster.

Thanks - Please consider giving points, if you think this post was helpful.

Reply
0 Kudos
unexpected
Contributor
Contributor

I had the same problem after upgrading to built 226117. I tried

disabling / enabling HA without success. The "service mgmt-vmware

restart" cleared the error, but it came back after a few moments.

Restarting the host did not solve the problem and gave the same result

as the restarting of the service.

I finally managed to solve the problem by removing the host from the

cluster, connecting to the removed host with the VIC and deleting all

resource pools on it. Afterwards I added the host back to the ESX

cluster and the issue seemed to be solved.

Now let's see if it stays like this... fingers crossed.

Geek, tech-enthusiast & blogger @ http://www.unexpected.be | twittering @

Geek, tech-enthusiast, VCP3, VCP4 & blogger @ http://www.unexpected.be | twittering @ http://twitter.com/unexxx
Reply
0 Kudos