VMware Cloud Community
MJMSRI
Enthusiast
Enthusiast
Jump to solution

vSAN 7.0.3 Stretched Cluster - Simulate entire 1 Data site failure

Hi All, 

We have a 3+3+1 vSAN Stretched Cluster, so 2 data sites each with 3 Hosts and a witness site. 

Only vSphere Standard in place, so vSphere HA Enabled. However no DRS as no licence for this. 

We want to simulate a data site failure, what would be the best way to do this so we can effectively complete a 'Test DR Site Failover'?

Would it be to login to the hosts iLO and simply power off the HPE Hosts? Or is there a built in script to initiate the vSphere HA workflow so it would gracefully shutdown the VMs in the data site, then power them on in the remaining data site?

Thanks,

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

@MJMSRI ,Not aware of any such HA script/workflow, but nonetheless, if you are trying to emulate real world scenarios e.g. one site loses power or super-eager digger cuts the ISL, then do it like that e.g. turn all the nodes off abruptly via iLO or pull all the cables/disable switchports of the NICs backing the vSAN network on those nodes.

 

Or use the always amusing vsish reliability crashMe injector to PSOD the nodes if you want to also test crash auto-restart functionality which IIRC iLO can have configured (which is also maybe a contentious topic as it doesn't wait for coredump.to.complete as far as I recall).

View solution in original post

2 Replies
TheBobkin
Champion
Champion
Jump to solution

@MJMSRI ,Not aware of any such HA script/workflow, but nonetheless, if you are trying to emulate real world scenarios e.g. one site loses power or super-eager digger cuts the ISL, then do it like that e.g. turn all the nodes off abruptly via iLO or pull all the cables/disable switchports of the NICs backing the vSAN network on those nodes.

 

Or use the always amusing vsish reliability crashMe injector to PSOD the nodes if you want to also test crash auto-restart functionality which IIRC iLO can have configured (which is also maybe a contentious topic as it doesn't wait for coredump.to.complete as far as I recall).

depping
Leadership
Leadership
Jump to solution

I would recommend using the "vsish" option, it mimics a proper crash, or iLO hard power off!

Do note, if you do a "clean power off" and VMs are brought down, then HA will not restart the VMs. Hence you need to do the above to ensure a restart occurs.