VMware Communities
dom124214
Contributor
Contributor

hgfs file corruption when using different file handles to the same file in same process, different threads

Hello all,

I'm encountering a problem when using shared folders where files that are written to by a single process are being corrupted if another thread in that same process merely reads from the same file under a separately opened file handle.

This is occurring under VMware Fusion 4.1.4 under Mac OS X 10.7.5, using CentOS 5.8 as the guest OS. The corrupted files end up with blocks of zero bytes that end up overwriting a portion of its data. The size of the blocks does not generally match the length of the missing data.

I have attached the source code to a simple C++ test program that exhibits the problem fairly consistently. A Makefile is included to build the executable.

Usually it hits the bad case on the second or third attempt when it writes out to a shared folder. It doesn't hit the bad case at all when it's outputting to the local file-system. The test program returns non-zero if it reproduces the error, so a simple shell loop can be used to continually run the program until the bad case is hit.

The program has two threads. Each thread has a separately opened file handle to the same file. The first thread opens the two file handles. It first creates a handle to a file for writing, and then it opens a read handle to read back from the file being written out to.

It then sets up the second thread, which is given a file handle for writing, and this second thread writes ever-increasing consecutive integers to the file until it is signalled to stop. After writing out a single unsigned integer, it flushes the file.

The first thread will continually use its read file handle to seek around the file to randomly chosen 64kb boundaries, and atttempt to read 64kb. This mimics behaviour in our development system where we first encountered the corruption. The first thread performs no writes of its own to the write file handle after calling fopen(), and it only calls fseek() and fread(), on the read file handle, as well as stat() on the filename. It's unknown whether the 64kb boundary reads are significant to reproducing the problem.

If SIGINT is received, or a time limit was given on the command line, the second thread is signalled to stop after writing out any current integer and a subsequent fflush(). The first thread waits on the second thread, and then attempts to verify the output file, reading unsigned integers and checking that they are consecutively numbered from 0. If there's a mismatch, and it's a 0, then more integers are read from the file until we have the full block of zeros. The file position and the number of zeroes (and the number of bytes) is then output to stdout.

Any assistance, or even just confirmation of the bug, would be greatly appreciated. It would be nice to know whether similar problems have been encountered in the past, and whether it's likely to be fixed in the near future. Any further questions, or things for me to try, please feel free to ask.

Cheers and thanks,

Dominic

45 Replies
pirelenito
Contributor
Contributor

Hi,

I am also having this issue on my Vagrant development environment. My use case is a little different: I am doing web development with Node.js and I use a Grunt plugin that checks when a file is changed to automatically perform a browser reload. Whenever that plugin is enabled, this problem starts to happen.

This is the plugin that I've mentioned:https://github.com/gruntjs/grunt-contrib-watch#optionslivereload

Would appreciate any update on the resolution of this issue.

Thanks,

0 Kudos
steve_goddard
VMware Employee
VMware Employee

Hi there,

apologies for the long delay here.

I am in the process of checking in what I hope is the last set of changes to address this issue. There have been multiple changes required.

It should be not too long before these changes then make it into a release update which you will be able to pick up.

Thanks so much for your patience and understanding.
Steve

Thanks. Steve
0 Kudos
pirelenito
Contributor
Contributor

Hi Steve,

I just updated to the latest Version 6.0.2 (1398658) and the problem remains.

I am sorry to say this, but I am also very unsatisfied. VMWare is a tool I've used everyday to work with Vagrant, and it has become unsustainable to continue like this.

Can you give us an update on this? How much longer will it take for this to be properly fixed? A day? A month? Or should I gave up and use Virtual Box instead?

Thanks,

Michael_Bender
Contributor
Contributor

I am doing Android development on an Ubuntu 10.4 LTS VM on my MBP. I was running Fusion 5.X and could do a full build (kernel + userspace stuff) when the source tree was on an Ubuntu virtual disk, but when I tried the same thing on a shared folder that pointed to an OS X case-sensitive filesystem, I would get strange build errors such as the linker complaining about corrupted symbol tables, the clang compiler getting segment violations and so on. I read this thread and was excited to see that a fix was available, so I purchased Fusion 6.0.2 last night, fired off a fresh build and still the problems persist. So now I'm back to using NFS to share OS X mounts with my Ubuntu VM.

0 Kudos
steve_goddard
VMware Employee
VMware Employee

Sorry to hear that and I am somewhat surprised. You definitely have the latest vmhgfs version loaded and running I assume.

I tested with the test script I used to generate this issue on three different OS versions with different Linux kernels and they all worked fine.

Can you please tell me more about your environment and what you are doing exactly to generate this issue?

E.g. version of Linux and uname -a output, exact steps of your development maybe an strace output while accessing the files etc. you can send to me directly?

Thanks and sorry to hear that the changes have not improved things for you.

Steve

Thanks. Steve
rainboxx
Contributor
Contributor

Hey Steve, this issue (or a similar one) seems to happen on one of our development machines since yesterday. It's really confusing, since we have almost the same environment on many development machines and it just appeared out of nothing.

We use: VMWare Fusion 6.0.2 (latest of now, according to the update mechanism in VMWare) on Mac OSX 10.9 Mavericks, also latest of now.

What happens?  When extending files (adding more bytes to it) everything works as expected: the changes are visible within the virtual machine.  When removing bytes somewhere in the file, it only decreases the size by removing the same amount of bytes from the end of the file.  It seems that when working with vi or TextEdit, everything is synced correctly, but when working with Sublime Text 3 this occurs.  We don't know whether it only happens with ST3 or any other application.

0 Kudos
dehy
Contributor
Contributor

Hello there,

I have the same issue on Fusion 6.0.2 with VMWare Tools up-to-date and same configuration : files on OS X 10.9 shared via HGFS to Linux.

File read via OS X is correct. File read via Linux on VM is truncated at end. Adding 1 byte to the file fix the issue.

Occurs with Sublime Text 3 but not with Coda for example. I filled a ticket at sublime text as well Sublime Forum • View topic - Save not reliable

It may be a saving method Sublime Text uses that is not handled correctly by VMWare Tools and HGFS...

Thank you for keeping this issue high in your priority list, I'm also a customer that bought Fusion to be sure to have a dedicated team listening to customers Smiley Happy

0 Kudos
ChipMcK
Hot Shot
Hot Shot

My two cents

Does anyone have this working anywhere?  Same file, twice opened separately, with non-synchronized updates and not getting the file blitzed/corrupted ???

As I recall from Programming 101, you get what you asked for,  garbage.

0 Kudos
brendoncraword
Contributor
Contributor

I am getting this same problem on VMWare WorkStation.

Product: VmWare Workstation 10.0.1 build-1379776

Host OS: Ubuntu 12.04 bit (Kernel 3.2.0-56-generic)

Guest OS: Ubuntu 10.04 64 bit (Kernel 2.6.32-38-server)

VMWareTools: 9.6.1-1378637

Typical use case:

1. Run python file in guest

2. While python file is still running in guest, edit python file on host using Emacs

3. Stop and start python file in guest

Results:

Python file will be truncated to some arbitrary location

0 Kudos
dom124214
Contributor
Contributor

If you read the history, you'll see it's a single writer, multiple readers. Readers should still see states of the file that don't include insertion of garbage.

0 Kudos
IvarREW
Enthusiast
Enthusiast

HI Steve - thanks for your input.

There are definitely major issues with hgfs driver include with Fusion 6.0.2.

See the following threads for likely related problem reports :

Given the similarity of the symptoms described, this might be down to one issue.

I've already spent a considerable amount of time diagnosing, working around and reporting the issue and am still waiting. I've completely rebuilt my development environment to work around this bug so for me the damage is done and there is no more time to be lost, but my workaround situation involves compromises I'd rather not make.

Is there a way for those of us with a strong vested interest in tracking the resolution of this issue to see what is happening (or help out in any way) ?

0 Kudos
steve_goddard
VMware Employee
VMware Employee

Hi there,

Sorry to say I don't know of any way to track this issue automatically, other than I will update here when I know that it is fixed and to state that you should expect to see a fix in a tools release.

Steve

Thanks. Steve
rainboxx
Contributor
Contributor

It seems that by downgrading to the VMWare Tools 6.0.1, downloaded from https://softwareupdate.vmware.com/cds/vmw-desktop/fusion/6.0.1/1331545/packages/ as mentioned in this thread: https://communities.vmware.com/thread/462747, we solved our issue for now. Still looking forward to a proper solution because we have similar issues with a 5.0.1 version, too.

The similar issue with VMware 5.0.1 can be described as: our PHP application generates a bunch of files within the guest system (Ubuntu) on the shared folder (Mac) at once and loads them immediately.  During loading, the files are not properly written or read, because PHP throws an syntax error for the last line of the file.  Once we check the file manually, everything is ok. Once we do this again, a different file produces the same error. Preparing the files upfront and not writing but only reading them solves this issue.

0 Kudos
Michael_Bender
Contributor
Contributor

Did this ever get fixed?

0 Kudos
steve_goddard
VMware Employee
VMware Employee

Sorry, still working on this and testing.

I will update here when I can give you better information sorry for the delays.

Steve

Thanks. Steve
0 Kudos
31415926535
Contributor
Contributor

I'm seeing what look like the same symptoms in 6.02 - but there's no concurrent access to the file.

I edit a text file in emacs on the OS X host, removing a dozen characters from the middle of the file, save and exit. If I cat the file on the OS X host I see the correct content. The file on the Ubuntu 12.04 guest has the same size and metadata, but a dozen characters have been removed from the end of the file instead.

The file has different content in host and guest, in other words.

0 Kudos
steve_goddard
VMware Employee
VMware Employee

Yes, you will get the files edited on the host in-between accesses in the VM causing issues. That is the issue I caused with trying to address the concurrent handle accesses from the guest. These changes only went into 6.0.2.

I am still trying to reconcile the fix with the host modification issue.

Thanks.

Steve

Thanks. Steve
0 Kudos
rainboxx
Contributor
Contributor

Hey Steve, were you able to track down the issue to a point? Has the issue been introduced in the VMWare Tools shipped with 6.0.2 or does it depend on the VMWare Fusion version 6.0.2 itself? I'm asking because I would like to try whether it would work with a downgrade.

Thanks,

Matthias

0 Kudos
steve_goddard
VMware Employee
VMware Employee

Hi, Matthias,

Yes, I tracked down how I caused it, and it is ONLY in the Linux vmhgfs kernel client.

The VMware Fusion application and Shared Folders feature server side remain unchanged in this regard. So simply downgrading to 6.0.1 tools install in your VM is sufficient.

However, when downgrading your VM, first stop the VM. Edit the VMX file for the VM and verify the following setting:

tools.upgrade.policy = "manual"

After this it will be okay to restart your VM and perform the downgrade to 6.0.1 tools. Note, if you have not verified the VMX file setting and it is set to automatically update the tools, then immediately after you downgrade it will run and install the 6.0.2 tools once more. Changing the above setting will prevent this from occuring.

To edit the VMX file see the following:

VMware KB: Editing the .vmx file for your VMware Fusion virtual machine

Thanks for your help and understanding.

Steve

Thanks. Steve
0 Kudos
Michael_Bender
Contributor
Contributor

Thanks for tracking this down Steve. Will there be an update with this fix in it soon?

0 Kudos