VMware Cloud Community
AllBlack
Expert
Expert

Major HA issue with vCenter 4.1 and/or ESXi 4.1

Hi there,

I have had a few issues but I am not sure what caused it and I was hoping you can shed some light on it.

My enquiry exists out of two parts.

Last week we upgraded from vCenter 4.0 U2 to vCenter 4.1. We did not experience any issues until yesterday when I was trying to add

a ESXi 4.1 host to the cluster. The ESXi host was not configuring for HA. I checked all name resolution etc and that was fine.

I tried removing host from cluster and re-adding it but still it would not re-configure HA. Sometimes it worked for a while and then reported errors a bit later. I did disable HA from

cluster, renamed cluster and re-enabled it.

All host configured fine except for new host.

Since HA requires a common name for its das.allownetwork setting I did change "Management Network" on ESXi to "Service Console" for the time being.

I once more tried to re-configure the ESXi host for HA and I started to all my host fail their HA.

I even saw a message refering to total cluster fail and an initiating a failover message. Nothing actually failed over and I assume it was because no hosts

HA was actually working. Eventually it all settled and I just removed the ESXi host from vCenter. I have seen other people of forum with HA issues after

upgrading to vCenter 4.1.

The second part of my query is actually about the integration of ESXi 4.1 into a cluster with ESX 4.0 hosts. We decided to build our new host with ESXi 4.1

as vmware will stop providing ESX classic. I wonder if the significant differences could have caused my problems.

In our current ESX 4.0 setup Service console, Vmotion and Storage are each on a dedicated vlan. The VMkernel default gateway is set to an address on the

vmotion subnet. This always worked for us.

Now we have to throw ESXi in the mix and that works slightly different. The service console is gone and we have a management network vmkernel.

This IP is sitting on the same vlan as the ESX classics Service console.

I would have expected an issue here. You can only have one default gateway for the VMkernel. Now on ESXi I will have to specify the default gateway for my

service console/management network vlan as it is the isolation address HA will use and I actually require it if I want to communicate with vcenter.

From previous experience with ESX Classic I know that if my vmkernel default gateway does not match on all hosts my vmotion would not work.

I would have expected no issues with HA.

In this case I cannot specify a consitent vmkernel default gateway across all my ESX/ESXi hosts. On my ESX classic there is no vmk with in the range of my

management vlan so I would not be able to specify the gateway I am using on ESXi.

For some reason vmotion worked but very,very slow but I got major HA issues and not sure if it is at all related.

If my opinion on the vmkernel gateways is correct than how does one go around it? There is going to be transicion period and I will have a mix for of

platforms for a while. Moving from classic to hypervisor might not be more complicated than it sounds and might require a complete overhaul on the architecture part.

Please consider marking my answer as "helpful" or "correct"

Please consider marking my answer as "helpful" or "correct"
Reply
0 Kudos
3 Replies
tietzjd25
Enthusiast
Enthusiast

My issue is just as odd, All 4.1 and all 4.1 ESXi. Half the host (8 host Cluster) All hosts where configured with host profiles and are in current compliance

HA works and the other half I get Internal AAM ERROR - Agent could not start: Unknon HA error.

Reconfigure does not work not work on HA.

Joe Tietz VCAP-DCD Solutions Architect
Reply
0 Kudos
Gleed
VMware Employee
VMware Employee

Here's a couple of KB Articles that may help:

KB 1026037

KB 1007234

-Kyle

Reply
0 Kudos
tietzjd25
Enthusiast
Enthusiast

Got it resloved, seemed to be DNS/Network traffic issue. Fixed some minor DNS errors that did not seem to effect the cluster when it was ESX 4.0. But after fixing those DNS issues 7/8 hosts confgured correctly and doing a reconfig on the 8th host work also.

Not sure if putting HA on all 8 hosts at once was to much for the network or if HA is just tad bit touchy in ESXi 4.1. (Since vMotion worked at all times)

Happy to have this cluster converted from esx 4.0 to esxi 4.1 with vMA fastpath, loging and even used vMA to configure MPIO and Jumbo Frames.

Joe Tietz VCAP-DCD Solutions Architect
Reply
0 Kudos