cluster recovery

View Only

Back to discussions

Expand all | Collapse all

cluster recovery

1. cluster recovery

0 Recommend
Mike_Gray
Posted Sep 11, 2017 06:18 PM

Reply Reply Privately
Hello team,
Could you please help me to get the steps to recover data store from a all host down scenario.
2. RE: cluster recovery

0 Recommend
TheBobkin
Posted Sep 11, 2017 06:35 PM

Reply Reply Privately
Hello Mike
Are the hosts still down?
If yes - how was this determined? (e.g. is it disconnected from vCenter, not pingable, not reachable via out-of-band management, or are you on DCUI and can see it PSODED or had a power outage).
If the hosts are all powered-down then try to bring them all back up in as short a timespan as possible.
If the hosts have been rebooted and are back up then check the following via:
- Hosts are back in cluster normally (esxcli vsan cluster get), test ping over the vsan vmk interfaces between hosts if not clustered and if they *should* be able to communicate (check Multicast too if host version <6.5) try cluster leave and cluster join (check they all have the same cluster UUID and use this).
- All disk-groups are healthy (esxcli vsan storage list | grep -i cmmds) - All disks on all hosts should say 'true', if not then you have disks/disk-groups that will need to be looked at further (vmkernel.log and boot.gz are the place to start).
- All Objects are accessible and in a healthy config-state:
# cmmds-tool find -f python | grep CONFIG_STATUS -B 4 -A 6 | grep 'uuid\|content' | grep 'state\\\":' | sort | uniq -c
(should be all state 7 (Healthy) or 15 (Reconfiguring), if any Objects are inaccessible (e.g. state 12, 13) then either some disks/disk-groups are non-functional or potentially these existed prior to the outage).
The state of all of the above should also be indicated in some manner via the Health check on vsan via Web Client so ensure to get vCenter (and PSC if external) powered on ASAP - Cluster > Monitor > Health
Edit: re-read your question
Bob
3. RE: cluster recovery

0 Recommend
Mike_Gray
Posted Sep 12, 2017 05:03 PM

Reply Reply Privately
BOB,
Thanks for the update , I test all vsan cluster is up and healthy. But all the vms went in to read-only state even. psc and vcsa. tried to run e2fck but couldnt help, Can you tell how to recover from read-only mode and how to prevent this situation.
4. RE: cluster recovery

0 Recommend
TheBobkin
Posted Sep 12, 2017 06:57 PM

Reply Reply Privately
Hello Mike,
Are all Objects state 7?
Correct in assuming all Linux based VMs? Linux VMs can become read-only if unable to write to disk for too long.
If all Objects are healthy then reboot the Read-only VMs if they did not go down, could also try remounting the FS if applicable:
kb.vmware.com/kb/51306
Regarding the VCSA and PSC, they could have encountered this known issue:
kb.vmware.com/kb/2149838
Bob

vSAN1

cluster recovery

Mike_GraySep 11, 2017 06:18 PM

TheBobkinSep 11, 2017 06:35 PM

Mike_GraySep 12, 2017 05:03 PM

TheBobkinSep 12, 2017 06:57 PM

1. cluster recovery

2. RE: cluster recovery

3. RE: cluster recovery

4. RE: cluster recovery