VMware Cloud Community
JeremeyWise
Enthusiast
Enthusiast

vSAN Fails Restart: VMs listed as /vmfs/volumes/vsan:

 

Three node cluster.  Each with two SSD. First smaller one for flash second 1TB for data.  Noted that reboot / shutdown of host was taking hours.. So decided to shut cluster down to see what was going on.  On boot, vSAN volume is listed but no VMs are listed just a bunch of folders.

Ex: /vmfs/volumes/vsan:52f2089b18190833-66e05c9b09f7312b/c133f760-0868-d0cf-c8d6-a0423f377a7e/vCLS (2709).vmx

 

Status from each host:

[root@thor:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2021-08-10T11:31:48Z
Local Node UUID: 60f584a0-1d04-3c42-154b-a0423f377a7e
Local Node Type: NORMAL
Local Node State: BACKUP
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 60f591d6-36a6-1390-8715-98be9459fea0
Sub-Cluster Backup UUID: 60f584a0-1d04-3c42-154b-a0423f377a7e
Sub-Cluster UUID: 52f2089b-1819-0833-66e0-5c9b09f7312b
Sub-Cluster Membership Entry Revision: 4
Sub-Cluster Member Count: 3
Sub-Cluster Member UUIDs: 60f591d6-36a6-1390-8715-98be9459fea0, 60f584a0-1d04-3c42-154b-a0423f377a7e, 60f58b67-5dc7-eada-b179-a0423f35e8ee
Sub-Cluster Member HostNames: medusa.penguinpages.local, thor.penguinpages.local, odin.penguinpages.local
Sub-Cluster Membership UUID: ef3b1261-aeb8-b23e-d666-98be9459fea0
Unicast Mode Enabled: true
Maintenance Mode State: OFF
Config Generation: d55a702d-74ec-462f-9f64-9a3e7a82144e 13 2021-07-23T03:22:44.339
Mode: REGULAR

####

[root@odin:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2021-08-10T11:31:52Z
Local Node UUID: 60f58b67-5dc7-eada-b179-a0423f35e8ee
Local Node Type: NORMAL
Local Node State: AGENT
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 60f591d6-36a6-1390-8715-98be9459fea0
Sub-Cluster Backup UUID: 60f584a0-1d04-3c42-154b-a0423f377a7e
Sub-Cluster UUID: 52f2089b-1819-0833-66e0-5c9b09f7312b
Sub-Cluster Membership Entry Revision: 4
Sub-Cluster Member Count: 3
Sub-Cluster Member UUIDs: 60f591d6-36a6-1390-8715-98be9459fea0, 60f584a0-1d04-3c42-154b-a0423f377a7e, 60f58b67-5dc7-eada-b179-a0423f35e8ee
Sub-Cluster Member HostNames: medusa.penguinpages.local, thor.penguinpages.local, odin.penguinpages.local
Sub-Cluster Membership UUID: ef3b1261-aeb8-b23e-d666-98be9459fea0
Unicast Mode Enabled: true
Maintenance Mode State: ON
Config Generation: d55a702d-74ec-462f-9f64-9a3e7a82144e 13 2021-08-06T20:38:20.172
Mode: REGULAR

###

[root@medusa:~] esxcli vsan cluster get
Cluster Information
Enabled: true
Current Local Time: 2021-08-10T11:33:43Z
Local Node UUID: 60f591d6-36a6-1390-8715-98be9459fea0
Local Node Type: NORMAL
Local Node State: MASTER
Local Node Health State: HEALTHY
Sub-Cluster Master UUID: 60f591d6-36a6-1390-8715-98be9459fea0
Sub-Cluster Backup UUID: 60f584a0-1d04-3c42-154b-a0423f377a7e
Sub-Cluster UUID: 52f2089b-1819-0833-66e0-5c9b09f7312b
Sub-Cluster Membership Entry Revision: 4
Sub-Cluster Member Count: 3
Sub-Cluster Member UUIDs: 60f591d6-36a6-1390-8715-98be9459fea0, 60f584a0-1d04-3c42-154b-a0423f377a7e, 60f58b67-5dc7-eada-b179-a0423f35e8ee
Sub-Cluster Member HostNames: medusa.penguinpages.local, thor.penguinpages.local, odin.penguinpages.local
Sub-Cluster Membership UUID: ef3b1261-aeb8-b23e-d666-98be9459fea0
Unicast Mode Enabled: true
Maintenance Mode State: ON
Config Generation: d55a702d-74ec-462f-9f64-9a3e7a82144e 13 2021-08-06T20:38:20.236
Mode: REGULAR

 

[root@medusa:~] esxcli vsan trace get
VSAN Traces Directory: /vsantraces
Number Of Files To Rotate: 8
Maximum Trace File Size: 45 MB
Log Urgent Traces To Syslog: true

I did check network communication between each node and that seems solid.

 

###########

Looking at logs:

[root@thor:~] tail -100 /var/log/syslog.log
2021-08-10T11:26:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:26:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:26:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:270
2021-08-10T11:26:00Z root: CalcFreeSpace sizeKB: 5448, freeMB: 270
2021-08-10T11:28:00Z crond[2097972]: USER root pid 2142923 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:28:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:28:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:28:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:28:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:28:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:270
2021-08-10T11:28:00Z root: CalcFreeSpace sizeKB: 5468, freeMB: 270
2021-08-10T11:28:53Z /etc/init.d/vsanmgmtd: Terminating watchdog process with PID 2142268
2021-08-10T11:28:53Z watchdog-vsanperfsvc: [2142268] Signal received: exiting the watchdog
2021-08-10T11:28:58Z watchdog-vsanperfsvc: [2143100] Begin 'vsanmgmtd -c /etc/vmware/vsan/vsanmgmt-config.xml', min-uptime = 60, max-quick-failures = 1, max-total-failures = 1000000, bg_pid_file = '', reboot-flag = '0'
2021-08-10T11:28:58Z watchdog-vsanperfsvc: Executing 'vsanmgmtd -c /etc/vmware/vsan/vsanmgmt-config.xml'
2021-08-10T11:30:00Z crond[2097972]: USER root pid 2143171 cmd /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
2021-08-10T11:30:00Z crond[2097972]: USER root pid 2143172 cmd /bin/crx-cli gc
2021-08-10T11:30:00Z crond[2097972]: USER root pid 2143173 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:30:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:30:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:30:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:30:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:30:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:30:00Z root: CalcFreeSpace sizeKB: 5488, freeMB: 269
2021-08-10T11:32:00Z crond[2097972]: USER root pid 2143317 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:32:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:32:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:32:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:32:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:32:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:32:00Z root: CalcFreeSpace sizeKB: 5508, freeMB: 269
2021-08-10T11:34:00Z crond[2097972]: USER root pid 2143461 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:34:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:34:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:34:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:34:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:34:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:34:00Z root: CalcFreeSpace sizeKB: 5528, freeMB: 269
2021-08-10T11:35:00Z crond[2097972]: USER root pid 2143599 cmd /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
2021-08-10T11:36:00Z crond[2097972]: USER root pid 2143607 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:36:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:36:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:36:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:36:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:36:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:36:00Z root: CalcFreeSpace sizeKB: 5548, freeMB: 269
2021-08-10T11:38:00Z crond[2097972]: USER root pid 2143761 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:38:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:38:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:38:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:38:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:38:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:38:00Z root: CalcFreeSpace sizeKB: 5568, freeMB: 269
2021-08-10T11:40:00Z crond[2097972]: USER root pid 2143898 cmd /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
2021-08-10T11:40:00Z crond[2097972]: USER root pid 2143899 cmd /bin/crx-cli gc
2021-08-10T11:40:00Z crond[2097972]: USER root pid 2143900 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:40:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:40:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:40:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:40:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:40:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:40:00Z root: CalcFreeSpace sizeKB: 5588, freeMB: 269
2021-08-10T11:42:00Z crond[2097972]: USER root pid 2144050 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:42:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:42:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:42:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:42:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:42:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:42:00Z root: CalcFreeSpace sizeKB: 5608, freeMB: 269
2021-08-10T11:44:00Z crond[2097972]: USER root pid 2144190 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:44:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:44:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:44:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:44:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:44:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:44:00Z root: CalcFreeSpace sizeKB: 5628, freeMB: 269
2021-08-10T11:45:00Z crond[2097972]: USER root pid 2144325 cmd /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
2021-08-10T11:46:00Z crond[2097972]: USER root pid 2144333 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:46:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:46:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:46:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:46:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:46:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:46:00Z root: CalcFreeSpace sizeKB: 5648, freeMB: 269
2021-08-10T11:48:00Z crond[2097972]: USER root pid 2144488 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:48:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:48:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:48:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:48:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:48:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:48:00Z root: CalcFreeSpace sizeKB: 5672, freeMB: 269
2021-08-10T11:50:00Z crond[2097972]: USER root pid 2144630 cmd /bin/hostd-probe.sh ++group=host/vim/vmvisor/hostd-probe/stats/sh
2021-08-10T11:50:00Z crond[2097972]: USER root pid 2144631 cmd /bin/crx-cli gc
2021-08-10T11:50:00Z crond[2097972]: USER root pid 2144632 cmd /usr/lib/vmware/vsan/bin/vsanObserver.sh ++group=host/vim/vmvisor/vsanobserver
2021-08-10T11:50:00Z root: There are 1 /usr/lib/vmware/vsan/bin/vsanObserver.sh running ...
2021-08-10T11:50:00Z root: Cannot parse UUID from /vsantraces
2021-08-10T11:50:00Z root: Failed to get freeMB from UUID. Roll back.
2021-08-10T11:50:00Z root: Calc for ramdisk mounted on /, freeMB:28
2021-08-10T11:50:00Z root: Calc for ramdisk mounted on /vsantraces, freeMB:269
2021-08-10T11:50:00Z root: CalcFreeSpace sizeKB: 5692, freeMB: 269
[root@thor:~]

 

###########

Questions:

1) Is their a debug / repair path document to repair or recover a vSAN volume?

  - What I found so far: 

Commands: https://vdc-repo.vmware.com/vmwb-repository/dcr-public/26334f54-ee84-47c2-b2f3-901f51cbc98a/d3f55719...

https://kb.vmware.com/s/article/2059091  

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/products/vsan/vmw-gdl-vsa...

 

2) Is their recommendations of how to root cause this.

 

First time trying to repair a vSAN ... so likely missing something obvious.

 

 

 

 

 

 

 

 


Nerd needing coffee
0 Kudos
3 Replies
TheBobkin
Champion
Champion

@JeremeyWise 

Maintenance Mode State: ON

 

These nodes are in a vSAN-Decom state e.g. they are not providing any storage to the cluster.

If they are not in vSphere Maintenance Mode then the MMs are async - move/power-off any VMs on these nodes, place them in vSphere MM with 'No Action' option then immediately take them out of MM to get both MM vSAN and vSphere MM states in sync (and out of MM).

0 Kudos
JeremeyWise
Enthusiast
Enthusiast

 

That worked. 

 

Thanks for quick posting.

 

Does VMWare have a vSAN root cause / debug flow diagram that one can follow.  I would suspect others could use this.


Nerd needing coffee
0 Kudos
TheBobkin
Champion
Champion

@JeremeyWise 

Happy to help.

Step 1. for any vSAN issue is to check vSAN/Skyline Health, there is of course a Health check that would indicate this particular issue is occurring:

https://kb.vmware.com/s/article/51464

0 Kudos