Re: recover vsan

Mike_Gray · ‎10-27-2017

can we recover vsan data store from the below state

1	"content": "{\"state\": 12, \"CSN\": 1, \"SCSN\": 4}",
	1	"content": "{\"state\": 12, \"CSN\": 13, \"SCSN\": 16}",
	1	"content": "{\"state\": 12, \"CSN\": 17, \"SCSN\": 20}",
	2	"content": "{\"state\": 12, \"CSN\": 18, \"SCSN\": 21}",
	8	"content": "{\"state\": 12, \"CSN\": 19, \"SCSN\": 22}",
	7	"content": "{\"state\": 12, \"CSN\": 20, \"SCSN\": 23}",
	3	"content": "{\"state\": 12, \"CSN\": 22, \"SCSN\": 25}",
	1	"content": "{\"state\": 12, \"CSN\": 23, \"SCSN\": 26}",
	3	"content": "{\"state\": 12, \"CSN\": 24, \"SCSN\": 27}",
	5	"content": "{\"state\": 12, \"CSN\": 25, \"SCSN\": 28}",
	3	"content": "{\"state\": 12, \"CSN\": 26, \"SCSN\": 29}",
	2	"content": "{\"state\": 12, \"CSN\": 27, \"SCSN\": 30}",
	6	"content": "{\"state\": 12, \"CSN\": 29, \"SCSN\": 32}",
	2	"content": "{\"state\": 12, \"CSN\": 30, \"SCSN\": 33}",
	2	"content": "{\"state\": 12, \"CSN\": 31, \"SCSN\": 34}",
	2	"content": "{\"state\": 12, \"CSN\": 4, \"SCSN\": 7}",
	1	"content": "{\"state\": 12, \"CSN\": 4, \"SCSN\": 8}",
	5	"content": "{\"state\": 12, \"CSN\": 5, \"SCSN\": 8}",
	1	"content": "{\"state\": 12, \"CSN\": 5, \"SCSN\": 9}",
	1	"content": "{\"state\": 12, \"CSN\": 6, \"SCSN\": 9}",
	2	"content": "{\"state\": 13, \"CSN\": 1, \"SCSN\": 4}",
	1	"content": "{\"state\": 13, \"CSN\": 11, \"SCSN\": 14}",
	2	"content": "{\"state\": 13, \"CSN\": 12, \"SCSN\": 15}",
	1	"content": "{\"state\": 13, \"CSN\": 13, \"SCSN\": 16}",
	1	"content": "{\"state\": 13, \"CSN\": 16, \"SCSN\": 19}",
	1	"content": "{\"state\": 13, \"CSN\": 17, \"SCSN\": 20}",
	10	"content": "{\"state\": 13, \"CSN\": 18, \"SCSN\": 21}",
	14	"content": "{\"state\": 13, \"CSN\": 19, \"SCSN\": 22}",
	7	"content": "{\"state\": 13, \"CSN\": 20, \"SCSN\": 23}",
	2	"content": "{\"state\": 13, \"CSN\": 21, \"SCSN\": 24}",
	2	"content": "{\"state\": 13, \"CSN\": 22, \"SCSN\": 25}",
	2	"content": "{\"state\": 13, \"CSN\": 24, \"SCSN\": 27}",
	8	"content": "{\"state\": 13, \"CSN\": 25, \"SCSN\": 28}",
	8	"content": "{\"state\": 13, \"CSN\": 26, \"SCSN\": 29}",
	11	"content": "{\"state\": 13, \"CSN\": 27, \"SCSN\": 30}",
	4	"content": "{\"state\": 13, \"CSN\": 28, \"SCSN\": 31}",
	7	"content": "{\"state\": 13, \"CSN\": 29, \"SCSN\": 32}",
	2	"content": "{\"state\": 13, \"CSN\": 3, \"SCSN\": 6}",
	9	"content": "{\"state\": 13, \"CSN\": 30, \"SCSN\": 33}",
	5	"content": "{\"state\": 13, \"CSN\": 31, \"SCSN\": 34}",
	1	"content": "{\"state\": 13, \"CSN\": 32, \"SCSN\": 35}",
	1	"content": "{\"state\": 13, \"CSN\": 38, \"SCSN\": 41}",
	10	"content": "{\"state\": 13, \"CSN\": 4, \"SCSN\": 7}",
	2	"content": "{\"state\": 13, \"CSN\": 4, \"SCSN\": 8}",
	1	"content": "{\"state\": 13, \"CSN\": 5, \"SCSN\": 10}",
	11	"content": "{\"state\": 13, \"CSN\": 5, \"SCSN\": 8}",
	3	"content": "{\"state\": 13, \"CSN\": 6, \"SCSN\": 11}",
	1	"content": "{\"state\": 13, \"CSN\": 6, \"SCSN\": 9}",

TheBobkin · ‎10-27-2017

Hello Mike,

Any State 12 Object - no recovery possible as there is not a complete and healthy data replica.

State 13 Objects are inaccessible (no quorum) but *should* have a healthy data replica which *may* potentially be recoverable with VMware-GSS assistance (or by going the potentially risky setAttr FTT=0 route which may render the Objects unrecoverable by any means).

What Storage Policy was applied to these Objects? (e.g. FTT=0 or FTT=1, FTM=RAID1 or FTM=RAID5)

What happened in this cluster to result in this state?

What have you done so far to try and get more components accessible and healthy?

Bob

Mike_Gray · ‎10-27-2017

FTT=1, all the host went down due to power failure

TheBobkin · ‎10-27-2017

Hello Mike,

"FTT=1"

With RAID1 or RAID5 as the Fault Tolerance Method (FTM)? (I ask as this can determine what can be told from config-status)

"all the host went down due to power failure"

Things that need to be checked:

- Are all hosts up? (and completed boot including what can be a long 'SSD Initialization' process after an abrupt shutdown)

- Are they all participating in the cluster?

# esxcli vsan cluster get (run on all hosts)

- Are all hosts out of vSAN Maintenance Mode?

# cmmds-tool find -t NODE_DECOM_STATE -f json (run from any host, should be all 0: 0=Not in MM, 4=entering MM, 6=in MM)

- Are all disk-groups mounted correctly and all disks in CMMDS?

# esxcli vsan storage list | grep -i cmmds (run on all hosts)

Bob

All

recover vsan