matt82
Contributor
Contributor

vSAN vSphereHA

hi

i'm trying to set this up dead simply in a lab.

i just want to see my tiny 8GB newly created VMGuest failover from one host to the other.

vcenter 6.7 uj, latest version.

3 virtual nodes. running esxi.

2 as nodes (1x50gb cache vmdk, 1x200gb capacity vmdk) identical. 2 NICs. 1x1GbE (management). 1x10GbE (configured for vsan).

1 as witness node.

i've enabled vSphere HA. i've enabled DRS.

all enabled OK as a vsandatastore. which i can browse through vCEnter.

i can copy files manually to the vsandatastore.

i've uploaded a virtual guest (windows server 2012 r2, evaluation mode, brand new. put a few folders on the desktop for fun). from vmworkstation. just dragged it across to the vcenter's datacenter and into the cluster, selecting the vsandatastore as the destination disk.

(what worried me is the 10GbE network wasn't going crazy when i copied this to one node. i've looked at s2d from Microsoft before now and it instantly starts synchronising to the 2nd node.)

i've run the vguest on Node1. happy days. i'm pinging the guest from a 4th machine.

disconnect the network cables on node1. to simulate a power cut on one node (or a motherboard failure - bang!)

uh oh!! no pinging. wait. wait. wait.

i'm clicking around. looking for an alert. i can still get the vcenter interface up and i look in there of course.

"lost communication with node 1"

"insufficient vsphere ha failover resources "

they're identical nodes. i can't click on the VMguest and click migrate. i can't click play. the vsandatastore is empty.

i just want this to be easy and work as expected in the theory books.... to see this working in practice and then i can decide if its to be good enough for production.

any advice, or links to youtube. for a practical implementation to see all the features working. would be appreciated. otherwise i've lost 10 hours of my life.

thanks very much

🙂

3 Replies
matt82
Contributor
Contributor

ive been into policies  -> vm storage policies -> vsan default storage policy

looking in the storage compatibility, and mine doesnt appear there. not sure if it should or not??

if i tick force provisioning, on the previous advanced policy rules step, then it does show up.

does it need more 200GB hard disks adding? does it like to have 4 or 5

0 Kudos
matt82
Contributor
Contributor

ive just forced the provision through. the 10GbE card went crazy for a bit. which looks promising.

ive again disconnected the node2 from the network. (both NICs) to simulate a mainboard failure. bang!

but same again.

i cant even drag the VMGuest to the Node1 (through vCenter). it pops up with Move To: this action is not available....

vSanDataStore is now has a folder named Windows Server 2012. but thats empty.

not good!

0 Kudos
TheBobkin
VMware Employee
VMware Employee

Hello Matt,

Welcome to Communities and vSAN.

"uh oh!! no pinging. wait. wait. wait."

So, firstly even if HA and vSAN are properly configured this is expected behaviour - HA is not capable of magically migrating a VM off a dead/isolated node and thus what it does is determine that this has happened and restart the VM on a remaining node (providing the data is sufficiently available), note that this is not instantaneous, for zero-downtime failover you would configure FT VMs or some form of in-VM clustering/redundancy/load-balancing.

"i can't click play. the vsandatastore is empty."

What this means is that the data (including the namespace Objects AKA VM folders) are inaccessible - this could be either due to them being provisioned as FTT=0 Objects which have no redundancy (and you just lost either the whole or part of the only copy) or there is some other issue with the cluster (e.g. remaining node was no longer clustered with the Witness).

"ive been into policies  -> vm storage policies -> vsan default storage policy

looking in the storage compatibility, and mine doesnt appear there. not sure if it should or not??"

This means that the cluster in the current state is not capable of provisioning data with the Storage Policy (SP) rules chosen - e.g. if you only have 2-nodes available (e.g. 1 data-node + Witness) then you can't use an SP that requires a minimum of 3 nodes for component placement (e.g. the Default SP rules).

"if i tick force provisioning, on the previous advanced policy rules step, then it does show up."

Yes, but that's because Force Provisioning means it will make FTT=0 data if FTT=1 data creation is not possible - this means you don't have 3 nodes with available storage/clustered properly here.

"i cant even drag the VMGuest to the Node1 (through vCenter). it pops up with Move To: this action is not available...."

As I said above...how exactly would you expect vCenter to move/migrate a VM from a ESXi host it is not currently connected to? Secondly (if you had this configured correctly, which it looks like you don't) even if it was connected back to vC (but not vSAN) the VM would be dead/crashed as its Objects (e.g. vmdks) would be inaccessible from this host as they would have lost quorum.

What Storage Policy (SP) did you apply to your test VM? If none then assign an SP with the storage rules you want applied to your data (e.g. right-click VM, Edit Storage Policy, select an SP and apply).

So, step 1 here should be to put your cluster back together and validate that you can make FTT=1 FTM=RAID1 Objects - if you cannot then this is the problem from the start, this can easily be validated via:

Cluster > Monitor vSAN > Proactive tests > Proactive VM creation Test > Run

However, note that the above test uses the SP assigned as default to the vsanDatastore so if you have been messing with that (e.g. adding Force Provisioning = true) then undo these changes or make a new standard FTT=1,FTM=RAID1 SP.

If the above test fails it should tell you why e.g. need 3 Fault Domains (e.g. nodes) but found only 2 (e.g. if you didn't configure disks for the Witness or a node is isolated or has disks offline/unusable).

More information relating to the state of the cluster can be checked at:

Cluster > Monitor > vSAN > Health > retest

Bob