VMware Cloud Community
Sateesh_vCloud

3 VM's went to inaccessible state after planned power outage

We have VSAN enabled cluster for Engineering team and site has power maintenance which forced to shutdown the cluster with below procedure.

Followed VMware article:  Shutting down and powering on a vSAN 6.x Cluster when vCenter Server is running on top of vSAN (2142...

Later we power-on Cluster and notice 3 VM's are inaccessible and can't find them in VSAN data store folder

Any thoughts?  We are at latest VSAN 6.6.1

------------------------------------------------------------------------- Follow me @ www.vmwareguruz.com Please consider marking this answer "correct" or "helpful" if you found it useful T. Sateesh VCIX-NV, VCAP 5-DCA/DCD,VCP 6-NV,VCP 5 DCV/Cloud/DT, ZCP IBM India Pvt. Ltd
0 Kudos
3 Replies
TheBobkin
Champion
Champion

Hello Sateesh,

Are you positive that nothing was resyncing and all Objects were compliant with their Storage Policy and did all hosts enter Maintenance Mode successfully within a few minutes of one another?

  

If VMs are inaccessible and their namespace folder cannot be found on vsanDatastore then it is likely that their namespace Objects are unhealthy and thus inaccessible.

Can you confirm that all hosts are successfully out of vSAN Maintenance Mode? (This may vary frm what they show in vCenter Maintenance Mode):

# cmmds-tool find -t NODE_DECOM_STATE -f json

(This should show state 0 for each hosts Decom state, post output if unsure)

What is the output of this command when run on a host?:

# cmmds-tool find -f python | grep CONFIG_STATUS -B 4 -A 6 | grep 'uuid\|content' | grep 'state\\\":' | sort | uniq -c

(apologies in advance as the output of this will likely not be formatted correctly into 3-4 lines as I don't have the 6.5 version of this command handy - this command shows the Object config status of all Objects on the vsanDatastore, e.g. State: 7 is healthy, State: 12 is a type of inaccessible).

  

Do you have a support agreement with VMware for this cluster?

If so then I advise opening a Support Request.

Bob

Sateesh_vCloud

I confirm no pending operations before the cluster maintenance and notice below from RVC

2017-08-24 16:16:00 -0700: Step 1: Check for inaccessible vSAN objects

Detected 0 objects to be inaccessible

2017-08-24 16:16:01 -0700: Step 2: Check for invalid/inaccessible VMs

Detected VM 'Autopology' as being 'inaccessible'

Detected VM 'ubuntu-16.04-server-cloudimg-amd64.ova' as being 'inaccessible'

Detected VM 'vROPS-6.5' as being 'inaccessible'

Output attached for the given commands:

We do not have support as this is LAB Infrastructure for Dev/ops team

------------------------------------------------------------------------- Follow me @ www.vmwareguruz.com Please consider marking this answer "correct" or "helpful" if you found it useful T. Sateesh VCIX-NV, VCAP 5-DCA/DCD,VCP 6-NV,VCP 5 DCV/Cloud/DT, ZCP IBM India Pvt. Ltd
0 Kudos
TheBobkin
Champion
Champion

Hello Sateesh,

Well that is very strange - all nodes are out of vSAN MM and all Objects are state 7 (should be accessible) so no VMs should be inaccessible (but as I said, that is 6.2 version of script so it is possible some state is not being seen along with the shoddy formatting).  Are all disk-groups healthy? (esxcli vsan storage list | grep -i cmmds)

Please log into RVC (from your command output it seems you are there already) via the vCSA/Windows vCenter and run this against the cluster:

> vsan.check_state -r <pathToCluster>

This should owner abdicate all objects and if they are healthy, make them accessible.

If this does not resolve then please attach the output of vsan.obj_status_report -t <pathToCluster>

Bob

0 Kudos