Just curious if anyone else has been experiencing issues where their cluster nodes randomly experiencing an error with HA. As a result the Cluster is degraded. Right now I have multiple nodes with the same error in a cluster of 3 nodes thus nullifying HA for the entire cluster. Ironically DRS still appears to work if for the nodes experiencing HA problems. The unfortunate part is that the interface simply states the agent has an error but doesn't actually provide information beyond that.
Thus far I have been unable to correct the issue (other than rebuilding the node).
I have an open case with VMware Support but we haven't found a resolution and I have just been handed off to another technician.
Keith,
Thank you for you posting that! I was indeed searching, and were going through the exact same thing; trying rebooting, checking for DNS issues, "reconfigure for HA" on the hosts etc. Simply disabling HA on the cluster and reenabling solved this issue for me!
EDIT: This was on VI 3, latest patches applied, but it seems the issue is still there..
Message was edited by:
Svante
Not trying to thread-jack, but I have a very similar issue:
Running into a very similar issue ... our ESX is pre-production while I wait on a new switch for the iSCSI traffic, vlans etc.
Anyway, I get the errors described above when trying to setup HA and I am pretty sure that it is a DNS issue, and I need some quick help on the resolv.conf and nsswitch.conf.
I do not have a DNS server setup in the test environment (yes I know it is a 'requirement' for HA) and I want to set everything up so that it will work without a DNS server so that in the future it can withstand a DNS server failure. I only have 2 ESX boxes, so it is very manageable.
I have added short & FQDN for each ESX box and Virtual center in both ESX servers. They can all ping each other.
The resolv.conf only has one line "search xxx.edu"
The nsswitch.conf has "hosts: files dns"
Networking wise, I have the Console and Kernel traffic both on their own NIC and separate IP block with no routing between them. I have added a VMKernel port on the Console switch for NFS access.
In the logs I see a few errors:
\[2007-07-23 05:40:47.491 'App' 12782512 warning] Fault Msg: "Unable to change license state as the license server is not available."
\[2007-07-23 05:41:07.859 'App' 9935792 error] \[VpxaVMAP::Invoke] Command /usr/bin/perl /opt/LGTOaam512/vmware/aam_config_util.pl -z -cmd=listnodes -domain=vmware failed with error 1
\[2007-07-23 05:41:07.859 'App' 9935792 error] \[VpxaVMAP::Invoke] Command /usr/bin/perl /opt/LGTOaam512/vmware/aam_config_util.pl -z -cmd=listnodes -domain=vmware failed with error 1
\[2007-07-23 05:41:07.859 'App' 9935792 error] \[VpxaVMAP::Invoke] Command output:
KEY: -z VAL: 1
KEY: domain VAL: vmware
KEY: cmd VAL: listnodes
CMD: hostname -s
RESULT:
\----
vmesxdl3950
CMD: /opt/LGTOaam512/bin/ft_gethostbyname vmesxdl3950 |grep FAILED
RESULT:
\----
ft_gethostbyname(vmesxdl3950) FAILED!
VMwareerrortext=failed to resolve hostname/ip using short hostname vmesxdl3950
VMwareerrorcat=hostmisconfigured
VMwareresult=failure
Total time for script to complete: 0 minute(s) and 0 second(s)
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
So can any guru's spot the problem right off? I have no clue why the licensing server issue is listed ...
Aww Frign @#$!
I just realized the hostname on one of the servers is wrong ....
It should be vmesxdl3850 not vmesxdl3950 ..... any good way to change the hostname? remove and re-add to VC?
Alright it was the hostname issue =(
Solved by reconfiguring DNS and Routing on the server & rebooting.
Whats the command in root to check the namesearch, I am having a brain cramp?
i had the same this is what i did, try it maybe it will help
I made a list of everything i need to check prior to clustering some of my servers, this is to standardise my network configuration on all my hosts, hope this helps somebody
Ensure network configuration is correct in the following config files
1. Putty into ESX host
2. Logon as Root
The following you can copy and past as is
3. vi /etc/sysconfig/network
4. vi /etc/hosts
5. vi /etc/resolv.conf
6. service network restart
7. In the VI client go to configuration, Software - Routing.. Ensure that configuration matches what you changed above.
PS it would seem that case does matter.
8. Migrate all guests off the host.
9. Restart esx server
Any suggestions of comments will be welcome