Cannot power on VM, getting File Not Found error.

gurpalrattu · ‎11-04-2021

The error I'm getting is "File was not found. The system cannot find the file specified. Unable to enumerate all disks."

I've tried removing the vm from inventory and then re-registering it, that leads the VM to being inaccessible and not powering on. The VM remains in a powered off state.

Any help is appreciated.

gurpalrattu · ‎11-04-2021

Attaching my log file and vmx file.

kastlr · ‎11-04-2021

Hi,

the vmware-15.log file covers early October timeframe only, it ends on 2021-10-03T07:56:29.843Z.

I assume you might have more current logfiles, as I don't expect that your problem exists for more than 4 weeks.

But maybe they aren't really required.

I could see from the .vmx file that the VM wants to mount a CentOS iso image, maybe this was removed from your vSAN datastore.

As the VM only uses a single vmdk (which might be a snapshot) I would expect that you already checked if the vmdk is still there.

Hope this helps a bit.
Greetings from Germany. (CEST)

gurpalrattu · ‎11-04-2021

Unfortunately this VM has sat idle for 4 weeks so that is the last logfile that's been generated. I was hoping there would be a new logfile that was generated when I tried to launch the VM today or from re-registering the VM, but no nothing new. I've checked the path to where the CentOS iso is and it is in fact there. I've also checked the vmdk file and the one listed in the vmx is the largest file in the folder with the type of Virtual Disk.

kastlr · ‎11-05-2021

Hi,

Let's do the following.

Check on which node the VM is currently registered.
Than try to start the VM again and follow/check/collect the vmkernel & vmkwarning logs on that node.

With those logs and a list of all files in the VM directory we might be able to sort things out.

Hope this helps a bit.
Greetings from Germany. (CEST)

a_p_ · ‎11-05-2021

The important part in the log file is likely this:

2021-10-03T07:56:00.176Z| vcpu-0| I125: DISKLIB-LINK : "/vmfs/volumes/vsan:52e2fa079c5e6e1a-194d295399193b07/51364561-b813-abaf-5aa2-3cecef71b8c0/la-dfscache04_2.vmdk" : failed to open (The system cannot find the file specified). 
2021-10-03T07:56:00.176Z| vcpu-0| I125: DISKLIB-CHAIN :"/vmfs/volumes/vsan:52e2fa079c5e6e1a-194d295399193b07/51364561-b813-abaf-5aa2-3cecef71b8c0/la-dfscache04_2-000001.vmdk": Failed to open parent "/vmfs/volumes/vsan:52e2fa079c5e6e1a-194d295399193b07/51364561-b813-abaf-5aa2-3cecef71b8c0/la-dfscache04_2.vmdk": The system cannot find the file specified.
2021-10-03T07:56:00.176Z| vcpu-0| I125: DISKLIB-CHAIN : "/vmfs/volumes/vsan:52e2fa079c5e6e1a-194d295399193b07/51364561-b813-abaf-5aa2-3cecef71b8c0/la-dfscache04_2.vmdk" : failed to open (The parent of this virtual disk could not be opened).
2021-10-03T07:56:00.459Z| vcpu-0| I125: DISKLIB-VMFS : "vsan://be484561-ba12-7a9f-2d01-ac1f6b79a24a" : closed.
2021-10-03T07:56:00.459Z| vcpu-0| I125: DISKLIB-LIB : Failed to open '/vmfs/volumes/vsan:52e2fa079c5e6e1a-194d295399193b07/51364561-b813-abaf-5aa2-3cecef71b8c0/la-dfscache04_2-000001.vmdk' with flags 0xa The parent of this virtual disk could not be opened (23).

According to this, the VM has an active snapshot, and the snapshot's base/parent virtual disk object cannot be found (got lost!?) on the vSAN datastore.

André

kastlr · ‎11-05-2021

HI André,

don't know how I failed to find those info in the logs, but you're absolutely right.

So thank you for assisting here, and sorry to gurpalrattu that I wasted your time requesting new logs while all info was already available.

Hope this helps a bit.
Greetings from Germany. (CEST)

gurpalrattu · ‎11-05-2021

No worries, is there any way I can recover the parent file?

a_p_ · ‎11-05-2021

I've moved the discussion to the VMware vSAN Discussions.
Maybe @TheBobkin can help!?

André

TheBobkin · ‎11-06-2021

@gurpalrattu, can you generate and attach the contents of:

# esxcli vsan debug object list --all > /tmp/objout

(If ESXi version is lower than 6.7 U3 then omit the '--all' part)

gurpalrattu · ‎11-09-2021

Hi @TheBobkin , I've attached my debug file.

Thanks!

TheBobkin · ‎11-09-2021

@gurpalrattu The missing vmdk looks to be Object UUID: 53364561-32c9-4e6d-b67b-3cecef71b8c0 - this can be confirmed using:
# cat /vmfs/volumes/vsan:52e2fa079c5e6e1a-194d295399193b07/51364561-b813-abaf-5aa2-3cecef71b8c0/la-dfscache04_2.vmdk

This Object is no-where to be found in the esxcli debug object list output and thus was potentially FTT=0 and the only component of it was lost or deleted. There are 3 Inaccessible FTT=0 Objects (but that have some sub-components remaining and thus still can see some things about them) here but none of them have the Group UUID of la-dfscache04 and look to be just redo vmdks.

Is there any particular reason you have almost all Objects in this cluster stored with a forceProvisioning=1 Storage Policy? This is a bad idea because you may have had Objects unwittingly created as FTT=0 and if they are behind a snapshot vmdk it can be hard to notice this.

I don't think anything can be done here unfortunately, you can see if there is any reference to the Object still in CMMDS for potentially more clues e.g.:
# cmmds-tool find -t DOM_OBJECT -u 53364561-32c9-4e6d-b67b-3cecef71b8c0
And you can potentially get a hint as to whether the Object was FTT=0 by checking old vSAN Health summary logs on vCenter (stored in /var/log/vmware/vsan-health/) from any times when a node was in Maintenance-Mode with Ensure Accessibility - depending on the cluster and distribution of data, some things can be inferred from this e.g. if you have a 3-node cluster and just a few FTT=0 Objects then these will show as healthy while a node is in MM with EA whereas all of the other FTT=1 Objects will show as reduced-availability/reduced-availability-with-no-rebuild.

What immediately preceded the Object becoming inaccessible? e.g. a disk/disk-group failure or snapshot-based backups taken?

All

Cannot power on VM, getting File Not Found error.