VMware Cloud Community
polovina
Contributor
Contributor

VM not starting on second node in HA cluster when first node fails

Hi,

We created test DRS HA cluster in vcenter 6.0 that contains two hosts and one shared storage. When host entering maintenance mode all vms are migrated to anither node. But when we force power off the host vms are not starting on second node, and displays in "Powered On" state Status "Unknown".

Our DRS/HA settings are:

DRS  -> ON anf Fully automated

HA -> ON

Host Monitoring -> On

Protect against Storage Connectivity Loss -> Off

Virtual Machine Monitorind -> Disabled

Host Isolation -> Power off and restart VMs

VM restart priority -> High

VM monitoring connectivity -> High

Admission Control -> Do not reserve failover capacity

Datastore for Heartbeating -> Use datastores from the specified list and complement automatically (so we have there one shared Datastore)

Why wm is not starting on another node? What we are doing wrong?

0 Kudos
4 Replies
Kaustubhambulka
Enthusiast
Enthusiast

which procedure you are using to power off the node ?

0 Kudos
polovina
Contributor
Contributor

We are using "Force Power Off" from server ilo.

I see some entries on the second node in fdm.log:

error fdm[FFB0BB70] [Originator@6876 sub=Election opID=SWI-60b7acd9] [ClusterElection::SendAll] [60 times] sendto 10.1.1.11 failed: Host is down

verbose fdm[FFCD1B70] [Originator@6876 sub=Policy] [LocalIsolationPolicy::ProcessDatastore] Issuing lock check for datastore /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60

verbose fdm[FFA89790] [Originator@6876 sub=Cluster opID=SWI-a048760] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60 is locked

verbose fdm[FFA89790] [Originator@6876 sub=Cluster opID=SWI-a048760] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60 lock state is 3

verbose fdm[FFA89790] [Originator@6876 sub=Policy opID=SWI-a048760] [LocalIsolationPolicy::ProcessDatastoreLockState] check of /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60 returned 3 (scheduled=true)

verbose fdm[FFCD1B70] [Originator@6876 sub=Policy] [LocalIsolationPolicy::ProcessDatastore] Issuing lock check for datastore /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60

verbose fdm[FFA89790] [Originator@6876 sub=Cluster opID=SWI-2080867b] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60 is locked

verbose fdm[FFA89790] [Originator@6876 sub=Cluster opID=SWI-2080867b] [ClusterDatastore::DoCheckIfLocked] Checking if datastore /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60 lock state is 3

verbose fdm[FFA89790] [Originator@6876 sub=Policy opID=SWI-2080867b] [LocalIsolationPolicy::ProcessDatastoreLockState] check of /vmfs/volumes/594a2455-34d25690-c5ec-a0d3c102aa60 returned 3 (scheduled=true)

The datastore 594a2455-34d25690-c5ec-a0d3c102aa60  - is a shared datastore

Does it mean that there is dome lock on vm files and second node cannot get the correct status?

0 Kudos
Kaustubhambulka
Enthusiast
Enthusiast

lock state is 3. strange..

mode 3 is used by MSCS or FT.

Mode should be 1 (VM powerOn )or 0 (No lock).

0 Kudos
polovina
Contributor
Contributor

I found a problem. Ping was forbidden on default gateway which HA uses as isolation address. Due to all the hosts are located in one subnet, I didnt consider that warning important.

Anyway, speaking out on an issue helps to solve it! Thanks!

0 Kudos