vSAN object Health: Inaccessible Objects

HockeyFan04 · ‎02-02-2021

Hello All,

I have 35 Inaccessible Virtual Objects within my vSAN. From the looks of it they look like the local disks within the servers. Anyone know how to remedy this?

TheBobkin · ‎02-02-2021

@HockeyFan04 , Why Objects are Inaccessible can have a number of causes and is not so simple that a catch-all remedy could be provided.

Best thing to start with is vSAN Health UI and some initial aspects that should be checked:

Are there disks marked as failed/offline?

Is the cluster partitioned?

Are there nodes that are not in the cluster?

Is there max level congestion (of any kind) reported on any Disk-Groups?

Are there disks/Disk-Groups that have run out of space?

Are there nodes in Maintenance Mode (either vSAN MM or vSphere+vSAN MM)?

If none of the above then I would advise checking 1. What the Objects are and 2. Where the absent/degraded components reside (e.g. if the data is FTT=1 then there would have to be at least 2 components unavailable or stale to cause inaccessibility).

HockeyFan04 · ‎02-03-2021

@TheBobkin

Are there disks marked as failed/offline? If I dig down under the "Virtual Objects" section and use the "View Placement Details" selection for all 35 objects it states that the Component state for most of them are "Absent".

Is the cluster partitioned? Not sure what you mean by this.

Are there nodes that are not in the cluster? Every node we have is in the vSan Cluster and contributing stats correctly now.

Is there max level congestion (of any kind) reported on any Disk-Groups? Not that I can see

Are there disks/Disk-Groups that have run out of space? No, but there is the Skyline Health check "vSAN Disk Balance" that is in "Warning" State" and says that "Proactive rebalance is needed"

Are there nodes in Maintenance Mode (either vSAN MM or vSphere+vSAN MM)? No none of the nodes/hosts are in maintenance mode.

TheBobkin · ‎02-03-2021

@HockeyFan04, Regarding disks being failed/offline I meant vSAN disks (e.g. Cache-tier/Capacity-tier devices), again this will be shown in the Health UI (or in Disk Management UI - Cluster > Configure > Disk Management) but you can also determine which disk/Disk-Groups/nodes are the problem by looking at a few Inaccessible Objects and noting where the Absent/Degraded components reside (e.g multiple Inaccessible Object's Absent/Degraded components having particular disk/Disk-Group/nodes in common).

By cluster partitioned I was asking are some nodes not communicating with other nodes in the cluster e.g. they are in their own network partition.

Disk Balance in warning state is not of consequence with regard to data availability.

With regard to MM - While they not show vSphere MM being enabled in the UI (e.g. the host icon is different) but is important to validate that they are also out of vSAN MM (for which there is a Health check if they are not in sync)
https://kb.vmware.com/s/article/51464

What build version of ESXi and vCenter is installed?
Did any changes to the cluster precede this state? (e.g. disks or nodes physically removed/replaced, nodes put in MM or rebooted, nodes updated/upgraded, network changes etc.)

On any node in the cluster (provided the cluster is fully-formed - validate this with node count from 'esxcli vsan cluster get') run the following command to identify how many Object are in which unhealthy state e.g. state: 12 and state: 13 are two of the expected Inaccessible states (remove all leading '#' of course):
# cmmds-tool find -f python | grep CONFIG_STATUS -B 4 -A 6 | grep 'uuid\|content' | grep -o 'state\\\":\ [0-9]*' | sort | uniq -c

Then run the following 2 commands (or whichever is applicable e.g. if all Inaccessible are state: 12 then only the 1st one, if a mix of 12 and 13 then both etc.) and either share or PM me the output:

# for i in `cmmds-tool find -f python|grep FIG_STAT -C6|grep 'uuid\|content'|grep 'state\\\": 12' -B1|grep uuid|cut -d "\"" -f4`;do echo;echo "DOM_UUID: $i";echo;cmmds-tool find -f json -t DOM_OBJECT -u $i|grep content |sed 's/,/\n/g'|grep -E "componentUuid|componentState|faultDomain|diskUuid|type"|grep -v StateTS|sed 's/\}//g;s/{//g;s/"//g;s/content: //g;s/,//g;'|sed 's/attributes: //g'|grep -vE "Configuration|RAID"|sed 's/Witness/Witness /g' |sed 'N;N;N;N;s/\n/ /g';echo;python -c "print('*' * 124)";done

# for i in `cmmds-tool find -f python|grep FIG_STAT -C6|grep 'uuid\|content'|grep 'state\\\": 13' -B1|grep uuid|cut -d "\"" -f4`;do echo;echo "DOM_UUID: $i";echo;cmmds-tool find -f json -t DOM_OBJECT -u $i|grep content |sed 's/,/\n/g'|grep -E "componentUuid|componentState|faultDomain|diskUuid|type"|grep -v StateTS|sed 's/\}//g;s/{//g;s/"//g;s/content: //g;s/,//g;'|sed 's/attributes: //g'|grep -vE "Configuration|RAID"|sed 's/Witness/Witness /g' |sed 'N;N;N;N;s/\n/ /g';echo;python -c "print('*' * 124)";done

I am not aware of how good this forum platform (Khoros) is for not introducing line-breaks so do validate each command is one-line only.