VMware Cloud Community
Mike_Gray
Enthusiast
Enthusiast

cluster recovery

Hello team,

Could you please help me to get the steps to recover data store  from a all host down scenario.

Tags (2)
Reply
0 Kudos
3 Replies
TheBobkin
Champion
Champion

Hello Mike

Are the hosts still down?

If yes - how was this determined? (e.g. is it disconnected from vCenter, not pingable, not reachable via out-of-band management, or are you on DCUI and can see it PSODED or had a power outage).

If the hosts are all powered-down then try to bring them all back up in as short a timespan as possible.

If the hosts have been rebooted and are back up then check the following via:

- Hosts are back in cluster normally (esxcli vsan cluster get), test ping over the vsan vmk interfaces between hosts if not clustered and if they *should* be able to communicate (check Multicast too if host version <6.5) try cluster leave and cluster join (check they all have the same cluster UUID and use this).

- All disk-groups are healthy (esxcli vsan storage list | grep -i cmmds) - All disks on all hosts should say 'true', if not then you have disks/disk-groups that will need to be looked at further (vmkernel.log and boot.gz are the place to start).

- All Objects are accessible and in a healthy config-state:

# cmmds-tool find -f python | grep CONFIG_STATUS -B 4 -A 6 | grep 'uuid\|content' | grep 'state\\\":' | sort | uniq -c

(should be all state 7 (Healthy) or 15 (Reconfiguring), if any Objects are inaccessible (e.g. state 12, 13) then either some disks/disk-groups are non-functional or potentially these existed prior to the outage).

The state of all of the above should also be indicated in some manner via the Health check on vsan via Web Client so ensure to get vCenter (and PSC if external) powered on ASAP - Cluster > Monitor > Health

Edit: re-read your question

Bob

Reply
0 Kudos
Mike_Gray
Enthusiast
Enthusiast

BOB,

Thanks for the update , I test all vsan cluster is up and healthy. But all the vms went in to read-only state even. psc and vcsa. tried to run e2fck but couldnt help,  Can you tell how to recover from read-only mode and how to prevent this situation.

Reply
0 Kudos
TheBobkin
Champion
Champion

Hello Mike,

Are all Objects state 7?

Correct in assuming all Linux based VMs? Linux VMs can become read-only if unable to write to disk for too long.

If all Objects are healthy then reboot the Read-only VMs if they did not go down, could also try remounting the FS if applicable:

kb.vmware.com/kb/51306

Regarding the VCSA and PSC, they could have encountered this known issue:

kb.vmware.com/kb/2149838

Bob

Reply
0 Kudos