VMware Cloud Community
nejcs01
Contributor
Contributor
Jump to solution

A few VMs lost after error on disk

HI,

a few VS's disappeared, when our vsan cluster supposedly experienced a disk error. This is what I found in logs:

Alarm 'Errors occurred on the disk(s) of a Virtual SAN host' on 10.20.0.73 triggered an action

After that, some machines renamed themselves to something like this:  /vmfs/volumes/vsan:525fbd9975b4c517-40279e280259bf77/3c678859-d55a-9e92

When looking at cluster configuration / Disk management, all disks look healthy.

I can see the files of the missing VMs in the file browser, but their size is wrong and even if I import the vm it will not start.

Is there something I can do to save the VM's? It turned out that backup is 4 days old already...

Any help would be greatly appreciated.

Best regards,

Jernej

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

Hello Nejc,

The VM is likely getting marked as invalid due to it already being registered on this/another node with incorrect name (e.g. "/vmfs/volumes/vsan:525fbd9975b4c517-40279e280259bf77/3c678859-d55a-9e92")

You should be able to find out which VMs these are by navigating to the namespace folders referenced in the naming e.g. the above VM will start with "3c678859-d55a-9e92" - this can be done either via datastore browser or CLI

I would advise using the function in RVC in my last comment to rename these first - if VMs are still getting marked as Invalid after doing this then start taking a closer look at the .vmx files and other potential causes.

esxcli vsan health and debug are only available in vSAN 6.6, use the Health check (Cluster > Monitor > vSAN > Health) and/or RVC to identify other potential issues here.

Bob

View solution in original post

Reply
0 Kudos
4 Replies
TheBobkin
Champion
Champion
Jump to solution

Hello nejcs01,

What exactly do you mean by "missing"? In vSphere inventory?

It is possible/likely that the *missing* VMs are the renamed with partial directory-path VMs.

Check if these are the renamed with directory-path VMs - console to them to confirm.

You can easily fix the naming of those VMs via this method:

Login to RVC via your vCenter VM/vCSA Console/SSH and run:

> vsan.fix_renamed_vms <path_to_vms>

e.g. vsan.fix_renamed_vms ./localhost/DCName/Computers/ClusterName/ResourcePools/vms/Misnamed_VM

or to do multiple at once run this with * against the directory listing the VMs:

> vsan.fix_renamed_vms ./localhost/DCName/Computers/ClusterName/ResourcePools/vms/*

It will prompt for each VM do you want to rename it (Y/N) and display what the name it will be given as read from .vmx

More info on this and RVC in general:

www.virten.net/2017/07/vsan-6-6-rvc-guide-part-6-troubleshooting/#vsan-fix_renamed_vms

Bob

Reply
0 Kudos
nejcs01
Contributor
Contributor
Jump to solution

Hi TheBobkin,

Thank you for your quick answer.

If I browse to the directory of the affected VMs, the files are still there. I tried to register the VM, it did register, but now has a status of VmName (invailid) in the vcenter console.

So I guess the files are really damaged.

Interstingly, i tried

esxcli vsan health

but says

Error: Unknown command or namespace vsan health

esxcli vsan cluster get

returns state Healthy.

Do you know how to get this debuging working?

Nejc

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

Hello Nejc,

The VM is likely getting marked as invalid due to it already being registered on this/another node with incorrect name (e.g. "/vmfs/volumes/vsan:525fbd9975b4c517-40279e280259bf77/3c678859-d55a-9e92")

You should be able to find out which VMs these are by navigating to the namespace folders referenced in the naming e.g. the above VM will start with "3c678859-d55a-9e92" - this can be done either via datastore browser or CLI

I would advise using the function in RVC in my last comment to rename these first - if VMs are still getting marked as Invalid after doing this then start taking a closer look at the .vmx files and other potential causes.

esxcli vsan health and debug are only available in vSAN 6.6, use the Health check (Cluster > Monitor > vSAN > Health) and/or RVC to identify other potential issues here.

Bob

Reply
0 Kudos
nejcs01
Contributor
Contributor
Jump to solution

Hi Bob,

you were right, I've made another VM and have attached disk to it and it booted correctly.

It's interesting that vmx file was changed, just a few lines but disks were apparently ok.

Thanx a lot for your help!

Nejc

Reply
0 Kudos