VMware Communities
gorbehnare
Contributor
Contributor
Jump to solution

VM disk gets corrupted randomly: Buffer I/O Error on dev sda*, logical block xxxxxxxxx, async page read

Hello

I have been struggling with this for quite a while since Workstation 15.0.0 and above (never seen it in previous versions of VMware Workstation before). Basically the VM host disk gets corrupted to the point that the OS crashes and will never boot again.

Host OS: Windows 10 Pro (1803, 1809 and now 1903)

VMware Workstation 15.0.0, gets worse with every update, can't even boot the guest VM with 15.1.x

Guest VM: Ubuntu Server 18.04.2 LTS

Symptoms: VM works fine, not signs of corruption. VM is shutdown or running for extended period of time sees bad blocks or bad sectors on disk leading to data loss.

Created new VM using "SATA" virtual disk, but can't even install Ubuntu Server 18.04.2 LTS, because disk corruption occurs during installation, or right after the first reboot. Using SCSI (recommended) option it seems to at least install properly.

Troubleshooting:

Scanned my entire storage for any signs of bad sectors on physical disk, and nothing shows. Everything is working fine on the host.

Tried moving VM to another computer (Desktop) and issues persist (indicating issue is not with the particular notebook computer I'm using). Updated VMware Workstation to the latest version problem got worse. Uninstalled and reinstalled VMware several times, and ended up downgrading to 15.0.0. It completely corrupts any Linux virtual machine I have and I have to restore from snapshot or have to restore from backup (zip archive).

Ubuntu Server-2019-06-05-11-42-00.png

We are using VMware Workstation as a teaching tool at our college. Few students started experiencing this, but not all. I worry that we have to switch to VirtualBox or some other virtualization software mid-semester and it will mess up everyone's work.

I'm out of ideas on where to start looking... how do you fix bad sectors on a virtual disk?

0 Kudos
1 Solution

Accepted Solutions
continuum
Immortal
Immortal
Jump to solution

> Please attach Ubuntu Server.vmdk to your next reply.

At the moment my best theory is the assumption that the One drive sync function is incompatible with Workstation.

But you may also have an obscure parameter in the vmdk - that is why I ask for it.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

View solution in original post

0 Kudos
14 Replies
continuum
Immortal
Immortal
Jump to solution

Which filesystem do you use on the host ? - hope it is not compressed NTFS ?

Do you use snapshots a lot or have you enabled autodestruct - oops - autoprotect ?

We very likely also need your latest vmware.logs


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

I'm not using compression on the disk. It is NTFS (windows 10), but only indexing is enabled. It does have bitlocker enabled on the volume as well... hope that's not an issue because all my workstations have bitlocker enabled.

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

I do take a lot of snapshots, because the darn thing is too unreliable.

Unfortunately for me, this morning I cannot even restore the VM from Snapshot. VMware Workstation UI just sits there and does nothing, and VM is in frozen state with a black screen. It doesn't seem to recover from that, I basically have to reboot the system to get rid of it. End tasking, closing, nothing works. The only thing I see now is that vmware-vmx.exe is maxing out one core and goes nowhere. No disk activity, no more ram being consumed... This is strange... I can't figure out what the heck it's doing.

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

ok, now I  think the ongoing VM disk corruption issue has evolved since I have Windows 10 1903... I had updated to the latest VMware Workstation, but I started seeing corruption. Now that I have downgraded to 15.0.0 I see black screen on my machines and no boot. It seems like it this issue described here:

Workstation pro not working on windows 1903

It seems like I'm suffering from multiple issues. I'm going to try to update VMware Workstation to the latest version again (although my desktop is running the latest and still seeing corruption), hopefully at least this issue will be resolved.

I can't even create new VMs and start from scratch anymore. If this is not resolved later on today I may just give it up and switch to something else. Luckily these are just lab VMs, and I keep good backups of all configurations. I can't say the same for our students though. They will be in a world of pain right before mid-term exams (though that will also be a good lesson :smileygrin: ).

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Is "OneDrive - Georgian College" a mountpoint for an external USB-disk ?

> I do take a lot of snapshots, because the darn thing is too unreliable.

It actually is the other way round - the more snapshots the more unreliable a VM becomes !

Do you use any aggressive Antivirus-tools that scan vmdks ?

Please attach Ubuntu Server.vmdk to your next reply.

Your VM is probably dead because of this issue:

###### dumping content of iov ######

2019-06-05T11:19:13.844-04:00| vmx| I125: READ

2019-06-05T11:19:13.844-04:00| vmx| I125: startSector = 33554304

2019-06-05T11:19:13.844-04:00| vmx| I125: numSectors = 8

2019-06-05T11:19:13.844-04:00| vmx| I125: numBytes = 4096

2019-06-05T11:19:13.844-04:00| vmx| I125: numEntries = 1

2019-06-05T11:19:13.844-04:00| vmx| I125:   entries[0] = 2A871ABE000 / 4096

2019-06-05T11:19:13.844-04:00| vmx| I125: DISKLIB-LIB   : RWv failed ioId: #4286 (1355) (75) .

2019-06-05T11:19:13.844-04:00| vmx| I125: DISK: Disk I/O on 'scsi0:0' failed: Read beyond end of object (1355)

Did you ever tried to expand the VMDK after you started  creating snapshots ?


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

Is "OneDrive - Georgian College" a mountpoint for an external USB-disk ?

No, it's a local folder, but syncs with OneDrive. I also don't have OneDrive running, but I run OneDrive at the end of the day just to sync the files to the OneDrive (use it as a backup). I have thought about OneDrive may bre responsible so I make sure that it is not running when I use the VMs. So OneDrive should not be a cause/affect VM disk when the VM is running, and I suddenly see I/O errors in the guest OS randomly without OneDrive being involved as far as I can tell.

> I do take a lot of snapshots, because the darn thing is too unreliable.

It actually is the other way round - the more snapshots the more unreliable a VM becomes !

Yes, but I may just be doing anything and I start getting disk errors, my only option is to restore from snapshot. Unfortunately that is what I have had to resort to in order to keep the VM going, otherwise I'll have to reinstall the entire thing 3 times a day.

Do you use any aggressive Antivirus-tools that scan vmdks ?

not that I know of. I just use the Windows Defender that comes with Windows 10. I don't know if Windows defender is scanning these files, and I don't know how to check (yet). I'll look into that.

Did you ever tried to expand the VMDK after you started  creating snapshots ?

Nope, it's just a 16GB disk since day 1.

Actually the issue is it happens even when start a new VM. When the issue happens even I start from scratch (creating a new VM) before I get the chance to install Ubuntu server it goes bad in the middle of installation. It might work without issues for a while, so I can install, then suddenly it goes bad again. Right now I upgraded VMware to 15.1 and my old VM started working again after restoring from snapshot.

I have scanned all the physical disk for errors and nothing turned up on both computers. They are both running on Samsung SSDs. The desktop is running 2.5" Samsung 850 Evo (512 GB) and the laptop is running a Samsung PM981 NVMe 512GB... could this be some sort of compatibility issue with SSDs? I have been using Intel and Crucial SSD's in my previous systems and never seen this happen before.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

> Please attach Ubuntu Server.vmdk to your next reply.

At the moment my best theory is the assumption that the One drive sync function is incompatible with Workstation.

But you may also have an obscure parameter in the vmdk - that is why I ask for it.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

I looked into Windows Defender logs and it doesn't seem to log what files it scanned. Only that the quick scan started, and ended and scanned xxxx files. It does not seem to keep track of what files were scanned (or at least I can't see them in the event viewer). unfortunately I don't have time stamps of when the VMs go corrupted to try to compare the two events to see if I can put the two together. Unless someone here has a suggestion regarding that?

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

VMDK file attached.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

See timestamp of this logentries:

2019-06-05T11:18:40.386-04:00| vmx| I125: DISKLIB-LIB   : RWv failed ioId: #4121 (1355) (75) .

2019-06-05T11:18:40.386-04:00| vmx| I125: DISK: Disk I/O on 'scsi0:0' failed: Read beyond end of object (1355)

2019-06-05T11:18:40.386-04:00| vmx| I125: ###### dumping content of iov ######

2019-06-05T11:18:40.386-04:00| vmx| I125: READ

2019-06-05T11:18:40.386-04:00| vmx| I125: startSector = 33554424

2019-06-05T11:18:40.386-04:00| vmx| I125: numSectors = 8

2019-06-05T11:18:40.386-04:00| vmx| I125: numBytes = 4096

2019-06-05T11:18:40.386-04:00| vmx| I125: numEntries = 1

2019-06-05T11:18:40.386-04:00| vmx| I125:   entries[0] = 2A88D074000 / 4096

I believe that was the moment when your VM passed away.

The vmdk-file itself is fine - nothing to complain.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

nope, never tried expanding the volume.

Additionally the issue just popped up again, and I am looking at the processes... OneDrive is definitely not running. It doesn't seem like that antimalware service is doing anything significant, although Windows 10 and all these services are too complex that I really have no idea what I'm looking at anymore. I feel old! Smiley Happy

I'm at a loss...

I'm going to try to create a new VM now to see if I can capture anything from the new one, or if the same things show in the logs...

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

This is crazy... I created a new VM and nothing goes wrong, it works... for now. I'm going to move this VM to a non-OneDrive folder and start configuring this guest OS with my backup configurations. It would be really weird if OneDrive is responsible even when it is not supposed to be running. However it is a possibility since something should be looking at the files all the time to figure if something is changing regardless of OneDrive running sync or not. I don't know enough about it to know how that is done.

BTW, the timestamp does not correspond with any of the Windows Defender events as far as I can tell. Again, one can never be too sure in Windows 10 environment what is actually going on behind the scenes. The software is just too complicated.

If that works and doesn't get corrupted, then I have to find a way to backup VMs and sync them between all computers, somehow automatically that does not involve running 7zip and USB sticks! Any suggestions on that front will be welcome Smiley Happy  Running 7zip and copying to OneDrive seems to work, that's basically how all my other backups are done. However running 7zip is the painful part that I would like to avoid since it's wasteful and at some point I'll get lazy and forget to do (and pay the full price as how these things usually go). 

I will also check with the people who have run into this to see if they are using OneDrive or other type of Cloud storage (that's basically what the college recommends using).

0 Kudos
gorbehnare
Contributor
Contributor
Jump to solution

I think you are right about OneDrive. It seems that although OneDrive is not supposed to be running, but somehow it manages to corrupt the vm disk file just by having the files in the OneDrive folder.

Thank you very much for your help. I guess I'll have to upload compressed VM folders only to OneDrive for backup, since at this point I don't really know what OneDrive is doing to these files when they are not compressed. Interestingly enough if I shutdown the VM, copy files to OneDrive folder and sync OneDrive (takes a while), right after the sync is completed if I run the VM from that location (on either computers) it comes up with sector errors. This is something that should not happen!

If I just compress the entire folder using 7zip, files are fine and I have tested it several times today across both computers. it just takes a bit more time and effort and I have to make sure I always have the latest version of the 7zip file... somewhat of a mess, but not as bad as losing data in the VM.

I'll contact Microsoft support to see if they even know about this or have any ideas. I'm very surprised that more people are not running into issues like this (or maybe it's just me, or something with my setup here). There is no reason OneDrive should corrupt any files regardless of the file format. You figure they just copy the data exactly as is... very strange.

0 Kudos
continuum
Immortal
Immortal
Jump to solution

Makes perfect sense to me:

I had not seen this message before:

vmx| I125: DISKLIB-LIB   : RWv failed ioId: #4121 (1355) (75) .               
vmx| I125: DISK: Disk I/O on 'scsi0:0' failed: Read beyond end of object (1355)
vmx| I125: ###### dumping content of iov ######                               
vmx| I125: READ                                                               
vmx| I125: startSector = 33554424                                             
vmx| I125: numSectors = 8                                                     
vmx| I125: numBytes = 4096                                                    
vmx| I125: numEntries = 1                                                     
vmx| I125:   entries[0] = 2A88D074000 / 4096                                  

so it did not look like the issues we see most of the times.

Next time - I ask for OneDrive and other sync-tools out there if I see this again.

Ulli

edit...

just an idea - did youi ever try to move one of those corrupted vmdks outside of the onedrive directory ?

that may - eventually - fix the problem .... not very likely but ... try it anyway.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

0 Kudos