VMware Communities
osfda
Contributor
Contributor

Filesystem Errors: Windows 10 on ext4 platform filesystem (debian 11); vmware 16.2.4

I am getting the infamous "The operation on file...vmdk failed. If the file resides on a remote filesystem..." error, with the choices of Cancel, Continue, and Retry. If I persistently attempt Continue, it moves forward with the boot of Windows 10.

Observation: why can't you just have an ability to set a default response, like Continue, so it reports all disk errors to the virtual operating system? (if that's what they are...)

Given that the error indicates a disk error, in Windows I did a "chkdsk /f", rebooted, did a whole lot of Continues (!), then after reboot also did (as administrator...) the following to make sure this was not due to a misleading error due to missing components:

 

dism /Online /Cleanup-Image /CheckHealth
dism /Online /Cleanup-Image /ScanHealth
dism /Online /Cleanup-Image /RestoreHealth
sfc /SCANNOW

 

 

[Again: I had to do a lot of Continues to get through that; why one cannot just set a default answer in a vmware instance of Continue, I do not know.]

Also did a "vmware-vdiskmanager -R ...vmdk" from the linux platform to make sure that wasn't corrupt.

ON TOP OF THAT: I did a "touch /forcefsck" and rebooted, to make sure it was not due to a filesystem error on the linux platform itself.

STILL GETTING THOSE ERRORS.

I am evaluating your product; I really want to get it. I considered getting paid support incidents, but that will be insane if it cannot be fixed! (because it's debian 11 or whatever...)

If I am given the nebulous advice of just creating another vmx, with all the Microsoft keys I would have to reenter, I'd sooner take my chances with qemu. But again: I'd rather just get vmware workstation pro to work.

The system appears to work when Continues are pressed; perhaps if there is an under-the-hood setting for getting vmware to apply that response by default, maybe that could be a triage solution...

 

Reply
0 Kudos
16 Replies
Technogeezer
Immortal
Immortal

If you are receiving errors from Workstation that a file operation on a VM’s VMDK file failed, you should check the host’s operating system’s logs to see if there are indeed any errors being reported. The error is being thrown by the host, not the guest (VM). 
These errors are not normal occurrences and should not be ignored lest you are comfortable with losing data  

Where are the VMs files located on your system? Boot disk? External disk? Network share? 

- Paul (Technogeezer)
Editor of the Unofficial Fusion Companion Guides
Reply
0 Kudos
osfda
Contributor
Contributor

The main system partition (/); a partition layout is attached, along with the error message. I am running debian 11.

In dmesg, I have errors like:

 

[61746.352015] blk_update_request: critical medium error, dev nvme0n1, sector 821085720 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

 

 

 And in /var/syslog, I have:

 

Aug 13 11:47:40 debian kernel: [61746.352015] blk_update_request: critical medium error, dev nvme0n1, sector 82
1085720 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0

 

 

I get about 20 error dialogs during the boot of the virtual Windows instance.

My disk description from a "lshw -class disk -class storage" shows as:

 

 

 *-storage                  
      description: Non-Volatile memory controller
      product: NVMe SSD Controller PM9A1/980PRO
      vendor: Samsung Electronics Co Ltd
      physical id: 0
      bus info: pci@0000:02:00.0
      version: 00
      width: 64 bits
      clock: 33MHz
      capabilities: storage pm msi pciexpress msix nvm_express bus_master cap_list
      configuration: driver=nvme latency=0
      resources: irq:16 memory:a3100000-a3103fff

 

 

The output of a "smartctl -a /dev/nvme0" is attached (rather lengthy to put inline...), along with the output of "nvme error-log". smartctl shows the "Percentage Used" as 1%; so that's light wear. The disk is barely a year old, and I have seen no anomalous behavior with any other apps.

My current debian is 11.4, with uname showing the kernel as "5.10.0-16-amd64 #1 SMP Debian 5.10.127-2".

I had tried out running a Windows 10 instance on vmware workstation a year ago, on debian 10 for a couple of weeks, and never got these errors at the time. Is it possible that the nerwer mod to the kernel that vmware does doesn't handle this new kernel and SSD hardware correctly?? 

As I said, I did a "touch /forcefsck" and rebooted.

What other diagnostics would you recommend to check the media?

Reply
0 Kudos
ender_
Expert
Expert

Those dmesg and syslog errors indicate a serious hardware problem with your SSD – back up the data and RMA it.

Reply
0 Kudos
osfda
Contributor
Contributor

But I am only getting them when I use vmware workstation; and vmware workstation _does_ do a funky disk access with their kernel mod.

I am going to monitor the errors further while I pause use of vmware workstation, and try qemu. I will also contact Samsung and confer on those errors. It's a pretty high quality SSD, so I'm skeptical it's the hardware. Conceivably it might be the linux kernel driver for that disk; but seeing no other anomalous behavior, my suspicion is that it's the kernel patch done by vmware...

 

Reply
0 Kudos
ender_
Expert
Expert

Just means that the virtual disk's files are currently occupying bad pages on your SSD. Try simply copying the VM somewhere else (with any file manager), and pay attention to dmesg while doing that.

Reply
0 Kudos
osfda
Contributor
Contributor

Great advice, thank you. Will try that now; I also will call Samsung on Monday when they are open.

If I find anything odd from the copy, will relate it here.

Am going to see if there's a recommended way to do the copy with a utility or options that are designed to handle damaged files (other than a simple cp --preserve...)

Reply
0 Kudos
ender_
Expert
Expert

Start with a regular copy; if any of the files fail to copy, you can try ddrescue, which is designed to copy failing media.

Reply
0 Kudos
osfda
Contributor
Contributor

These are the vmdks:

 

4.0G Aug 13 13:03 'Windows 10 Pro Worstation x64-s001.vmdk'
4.0G Aug 13 13:03 'Windows 10 Pro Worstation x64-s002.vmdk'
3.0G Aug 13 13:03 'Windows 10 Pro Worstation x64-s003.vmdk'
4.0G Aug 13 13:03 'Windows 10 Pro Worstation x64-s004.vmdk'
4.0G Aug 13 13:03 'Windows 10 Pro Worstation x64-s005.vmdk'
4.0G Aug 13 13:03 'Windows 10 Pro Worstation x64-s006.vmdk'
3.6G Aug 13 13:03 'Windows 10 Pro Worstation x64-s007.vmdk'
 56M Aug 12 15:50 'Windows 10 Pro Worstation x64-s008.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s009.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s010.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s011.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s012.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s013.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s014.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s015.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s016.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s017.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s018.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s019.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s020.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s021.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s022.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s023.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s024.vmdk'
512K Nov  6  2021 'Windows 10 Pro Worstation x64-s025.vmdk'
416M Aug 13 13:03 'Windows 10 Pro Worstation x64-s026.vmdk'
2.0K Aug 13 11:43 'Windows 10 Pro Worstation x64.vmdk'

 

The .vmdx points to "Windows 10 Pro Worstation x64.vmdk"; so I presume those other ...x64-s... files are shards of it? ("Windows 10 Pro Worstation x64.vmdk" links to them??)

I figure the safest way to copy these is to use vmware-vdiskmanager to clone each one (per the Virtual Disk Manager User’s Guide,,,) If there is a bad block, either the utility will report it and fail, or it will make a best effort to make a plausible replica of it that might work.

I will mv these vmdk's into a subfolder (which won't really copy them; it will just update the ext4 file table to a different disk location...); then clone them back to the original parent folder with the same names. The command I would use to do that is:

 

vmware-vdiskmanager -r {backupFolder/filename}.vmdk -t {type} {filename}.vmdk

 

The trick is: what do I use for the type argument?

 

-t [0|1|2|3|4|5] Specifies the virtual disk type. This option is required when you create or convert
a virtual disk. Choose one of the following types:
„ 0 – create a growable virtual disk contained in a single file (monolithic sparse).
„ 1 – create a growable virtual disk split into 2GB files (split sparse).
„ 2 – create a preallocated virtual disk contained in a single file (monolithic flat).
„ 3 – create a preallocated virtual disk split into 2GB files (split flat).
„ 4 – create a preallocated virtual disk compatible with ESX server (VMFS flat).
„ 5 – create a compressed disk optimized for streaming.

 

For most of those files, the correct answer would seem to be a type of 3: "create a preallocated virtual disk split into 2GB files (split flat)". But some are 4GB. Do you happen to know how to check the type of a vmdk?

Seeing if I can get it from the player...

 

Reply
0 Kudos
osfda
Contributor
Contributor

Maybe if I copy the main vmdk, it will know to recreate the others.

Reply
0 Kudos
osfda
Contributor
Contributor

Maybe type 1 is what I want: "create a growable virtual disk split into 2GB files (split sparse)"

Reply
0 Kudos
osfda
Contributor
Contributor

It failed out at 5%:

vmware-vdiskmanager -r "old_vmdks/Windows 10 Pro Worstation x64.vmdk" -t 1 "Windows 10 Pro Worstation x64.vmdk"

Creating disk 'Windows 10 Pro Worstation x64.vmdk'
 Convert: 5% done.Failed to convert disk: Unknown error (0x1)

Going to see what linux command is best for copying a file that resides on bad sectors...

Reply
0 Kudos
osfda
Contributor
Contributor

Trying a:

dd if=fileWithBadBlocks of=recoveredFile bs=4k conv=noerror,sync
Reply
0 Kudos
osfda
Contributor
Contributor

Now I KNOW there's going to be errors doing those dd's; the question is: can I get a working VM after those vmdks are copied in such a manner?

Reply
0 Kudos
osfda
Contributor
Contributor

Well, it appears to have corrected the instance (I'll correct myself here if any errors pop up again; but NONE now...).

The startup of Windows 10 is taking soooo long; once it starts, it's responsive.
I have 32GB of physical RAM.

I will talk with Samsung Monday (preferably with a linuxhead...) who can assess what happened and what might still be happening.

I should do a "fsck -cfvr /dev/..." from a bootable USB; I don't think the autofsck was getting performed (though the file did get deleted -so I'm not sure; shouldn't an fsck catch errors on a disk like that??)

I read online that abrupt VM shutdowns can cause such corruption; perhaps that was done by me accidentally -but I don't remember doing so.

 

Reply
0 Kudos
osfda
Contributor
Contributor

Oh: and THANKS!

Reply
0 Kudos
osfda
Contributor
Contributor

After a few dsim's and sfc /scannow's, the boot time of the Windows instance is much more reasonable now. The boot was probably struggling with missing/corrupted components.

I also got a spooky restart not instigated by me, but read that others had that thanks to Windows 10's <sarcasm>fabulous</sarcasm> updating policies...

 

Reply
0 Kudos