VMware Cloud Community
Ph_Seifert
Enthusiast
Enthusiast

HA agent on ESX in Cluster has an error

Hi,

we have 2 ESX (3.5) Server in Cluster managed by Virtual Center (2.5). Yesterday I migrated all VMs from one ESX to the other ESX because i want to patch the ESX host with the latested patches from 18th September. I used the ESX Update Manager to patch the ESX Host. We use an SUN StorageTek 2510 iSCSI RAID as shared storage. The ESX Server are connected by cross cable directly to the storage. After patches were installed the ESX server rebooted but after it the iSCSI storage LUN is not present. The storage adapters (QLogic QLA4052C) shows the 1 Target LUN and the available paths to the storage. If i refresh on storage in the VC, i cant see the LUN. On the other ESX (not patched) the LUN is present an the machines are running.

I see in the log

/var/log/vmware/hostd.log

Sep 22 18:06:25 obelix vmkernel: 0:00:32:56.106 cpu2:1040)ALERT: LVM: 4476: vml.0200000000600a0b800049c128000003e748c8efbf4c43534d3130:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

Sep 22 18:06:25 obelix vmkernel: 0:00:32:56.123 cpu2:1040)ALERT: LVM: 4476: vml.0200000000600a0b800049c128000003e748c8efbf4c43534d3130:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

Sep 22 18:06:35 obelix vmkernel: 0:00:33:06.141 cpu2:1040)ALERT: LVM: 4476: vml.0200000000600a0b800049c128000003e748c8efbf4c43534d3130:1 may be snapshot: disabling access. See resignaturing section in SAN config guide.

I changed the following options from

EnableResignature=0, DisallowSnapshotLUN=1

to

EnableResignature=0, DisallowSnapshotLUN=0

after this i see the LUN as storage.

Now i have the Problem that the HA agent has an error. I see the following entries in the log files:

/var/log/vmware/hostd.log

Task Created : haTask-ha-host-vim.host.StorageSystem.refresh-131

Task Completed : haTask-ha-host-vim.host.StorageSystem.refresh-131

InvokePartedUtil /sbin/partedUtil

InvokePartedUtil /sbin/partedUtil

Error Stream from partedUtil while getting partitions: Geometry Known: 0

Hw info file: /etc/vmware/hostd/hwInfo.xml

Config target info loade

Thank You

with kind regards

Philipp Seifert

0 Kudos
10 Replies
jkumhar
Enthusiast
Enthusiast

Hi,

Can you try to ping the default gateway of the current ESX server and the other ESX server?

0 Kudos
Ph_Seifert
Enthusiast
Enthusiast

Hi,

i have tried to ping and everything its ok. I can ping the default gateway and both ESX can talk to each other.

Philipp

0 Kudos
Aftenposten
Enthusiast
Enthusiast

Hi Philipp,

This may not be your issue but HA is, with ESX 3.5 Update 2, no longer supported for a two-node cluster. You need at least three nodes in the cluster. This is from the release notes: When you put a server into maintenace mode, there is only 1 left to vmotion VMs to. This violates HA admission control because HA needs at least 2 servers. i.e. it needs 1 space server (Configured Failover Capacity).

Regards,

Gaute

0 Kudos
Ph_Seifert
Enthusiast
Enthusiast

Hi Aftenposten,

thank you for your information about HA. But why could I activate HA in the cluster settings. There is no info that it only could be activated if there are 3 ESX server. I called vmware support and they do uninstall die VMware HA rpm packages on the failed ESX host. Then we activated HA an everything worked fine (the packages were new installed). Now i have still the problem with the iSCSI LUN and i have zu resignature the LUN ID.

Thanks

Philipp

0 Kudos
Aftenposten
Enthusiast
Enthusiast

Like I said, the information I posted may not be relevant for your issue. I had a similar issue (same error message) and that was the answer I got from support. Apparently many people have reported this as a problem and they are looking into supporting it again in future updates. I totally agree with you that one should not be able to activate it if the feature is not supported. Do you mean that you have both HA and DRS working? My issue was that putting a host into maintenance mode doesn't vmotion the vm's off to the other host. It works fine when disabling HA, but I want both HA and DRS. This worked pre-update2 so hopefully it is soon back again.

0 Kudos
Ph_Seifert
Enthusiast
Enthusiast

Hi,

i phoned with the VMware support and they uninstalled the following rpm packages from the ESX server. Then we disconnected and connected the ESX server from the VC. After that everything works fine. Then we activated VMware HA in the cluster. The HA packages were installed on the ESX.

VMware-vpxa-2.5.0-104215

VMware-aam-haa-2.2.0-1

VMware-aam-vcint-2.2.0-1

If i deactivate the HA Setup then the error is the same that the HA agent on ESX in Cluster has an error.

This is the output of the /var/log/vmware/hostd.log

Hw info file: /etc/vmware/hostd/hwInfo.xml

Config target info loaded

Task Created : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-20

Updated local swap datastore policy: false

Task Completed : haTask-ha-compute-res-vim.ComputeResource.reconfigureEx-20

I am searching for the reason with the HA error although i have not actived HA.

Thanks

Philipp

0 Kudos
Iuridae
Contributor
Contributor

How is the upgrade status of the two hosts? Are both host updated to equal patch or is only one updated? Can you use vmotion between the hosts?

If i read your first post there seem to be a problem accessing the iSCSI storage. Is that problem fully resolved?

0 Kudos
Ph_Seifert
Enthusiast
Enthusiast

Hi,

the update status of the hosts are the same. Yes I can use VMotion to migrate VMs. The problem with the iSCSI storage was fully solved by resignatureing the iSCSI LUN. Now the problem is the HA.

Thanks

Philipp

0 Kudos
Ph_Seifert
Enthusiast
Enthusiast

Hi,

i fixed the problem with the HA agent error by installing the newest update of the VC (Virtual Center Update 3). In the release notes the problem is mentioned. Now I have take productive this machines. Sorry for the late answere.

Thanks all

Philipp

0 Kudos
get2future
Contributor
Contributor

On my side, I'm still using the VirtualCenter Update 2 and I fixed the problem by reconfiguring on every other nodes of my cluster.

Then, reconfiguring HA on the "failed" node worked.

Cheers,

Chris

0 Kudos