VMware Communities
dom124214
Contributor
Contributor

hgfs file corruption when using different file handles to the same file in same process, different threads

Hello all,

I'm encountering a problem when using shared folders where files that are written to by a single process are being corrupted if another thread in that same process merely reads from the same file under a separately opened file handle.

This is occurring under VMware Fusion 4.1.4 under Mac OS X 10.7.5, using CentOS 5.8 as the guest OS. The corrupted files end up with blocks of zero bytes that end up overwriting a portion of its data. The size of the blocks does not generally match the length of the missing data.

I have attached the source code to a simple C++ test program that exhibits the problem fairly consistently. A Makefile is included to build the executable.

Usually it hits the bad case on the second or third attempt when it writes out to a shared folder. It doesn't hit the bad case at all when it's outputting to the local file-system. The test program returns non-zero if it reproduces the error, so a simple shell loop can be used to continually run the program until the bad case is hit.

The program has two threads. Each thread has a separately opened file handle to the same file. The first thread opens the two file handles. It first creates a handle to a file for writing, and then it opens a read handle to read back from the file being written out to.

It then sets up the second thread, which is given a file handle for writing, and this second thread writes ever-increasing consecutive integers to the file until it is signalled to stop. After writing out a single unsigned integer, it flushes the file.

The first thread will continually use its read file handle to seek around the file to randomly chosen 64kb boundaries, and atttempt to read 64kb. This mimics behaviour in our development system where we first encountered the corruption. The first thread performs no writes of its own to the write file handle after calling fopen(), and it only calls fseek() and fread(), on the read file handle, as well as stat() on the filename. It's unknown whether the 64kb boundary reads are significant to reproducing the problem.

If SIGINT is received, or a time limit was given on the command line, the second thread is signalled to stop after writing out any current integer and a subsequent fflush(). The first thread waits on the second thread, and then attempts to verify the output file, reading unsigned integers and checking that they are consecutively numbered from 0. If there's a mismatch, and it's a 0, then more integers are read from the file until we have the full block of zeros. The file position and the number of zeroes (and the number of bytes) is then output to stdout.

Any assistance, or even just confirmation of the bug, would be greatly appreciated. It would be nice to know whether similar problems have been encountered in the past, and whether it's likely to be fixed in the near future. Any further questions, or things for me to try, please feel free to ask.

Cheers and thanks,

Dominic

45 Replies
steve_goddard
VMware Employee
VMware Employee

Hi Michael,

Thanks for tracking this down Steve. Will there be an update with this fix in it soon?

I don't know when the next releases are scheduled so I cannot say. However, it is also VMware general policy that I am not allowed to say even if I did know. Sorry.

I will let you know when this is fixed and likely to be out so you can reset the tools update setting for the Linux VMs and so you can try out the newer versions.

Thanks for your patience.

Steve

Thanks. Steve
0 Kudos
willoller
Contributor
Contributor

I am having what seems to be a related issue - using

Mavericks Host / Ubuntu Guest

Node.js file watcher (chokidar) https://www.npmjs.org/package/chokidar

and I am wondering if it's been solved as of 6.0.2

If it's solved, then I'll be digging deeper to figure it out...

Thanks!

0 Kudos
steve_goddard
VMware Employee
VMware Employee

I have this fixed internally now.

I am trying to get the changes in so that they can make it into a tools release soon.

Sorry for the delays in getting these issues dealt with.

Thanks.
Steve

Thanks. Steve
0 Kudos
steve_goddard
VMware Employee
VMware Employee

This is issue is now fixed and due to be released in the upcoming tools releases.

Please recheck with the next tools releases.

(Note, I can not allowed to say when exactly this will occur, as it is VMware policy as things change, and I really don't always get informed of those type of schedule changes anyway.)

Thanks for everyone's help and your patience.

Steve

Thanks. Steve
0 Kudos
qo
Contributor
Contributor

Hmm, doesn't seem to be any confirmation that this was resolved in the 6.0.3 release notes. 

VMware Fusion 6.0.3 and VMware Fusion 6.0.3 Professional Release Notes

Perhaps there's a separate set of release notes for VMWare Tools?  Anyone know if Steve's fix was the fix integrated into 6.0.3?

Thanks!

0 Kudos
dariusd
VMware Employee
VMware Employee

Workstation 10.0.2 and Fusion 6.0.3 use the same set of VMware Tools, which contain the fix.

I don't know why it was omitted from the Fusion 6.0.3 release notes...

--

Darius

0 Kudos