VMware Communities
cdmckill
Contributor
Contributor

Hangs & Disk Corruption using Ubuntu 7.10 w/Fusion 1.1 + 10.5.1

I have been dealing with this for the past couple of weeks. Turns out that several people at work have also been seeing it with the same configuration.

Leopard 10.5.1

VMWare 1.1

Ubuntu 7.10

As soon as there is any heavy amount of Disk I/O under Linux (doing a compile, or large svn checkout), the VM hangs. The process under MacOS is unkillable, and I have to force shutdown my Mac Book Pro (it will never reboot on it's own, sits spinning forever). When I get the Linux system back up and running the entire filesystem is corrupted, most times beyond repair.

We have tried using/not-using VMWare Tools -- doesn't have any effect. No other odd settings are performed beyond the Create New Virtual Machine wizard.

Is this a known issue to VMWare? None of us had any problems with 1.0 on Tiger, so I suspect it is either 1.1 or Leopard that is giving us the troubles.

thanks,

chris

Reply
0 Kudos
60 Replies
bapper
Contributor
Contributor

Sorry to hear that you also ran into the problem. Just one question: did you also have the "Optimize for Mac OS application performance" option selected in the VMware Fusion Preferences menu?

I had it set to "Optimize for guest OS" (or whatever the opposite setting is) but I believe I tried just about every configuration I could readily set and still hit the issue. I even tried turning off networking, USB, CDROM, etc. to see if anything there affected it. I hit it every time I tried to get some work done so it wasn't something I could stick with. This was mostly with Fedora 7 and 8, but I saw it when I tried it with some custom built 2.6.2x kernel linux installs as well. I tried it with and without ACPI and APM enabled, single and dual CPU settings. I can't remember if I tried using IDE instead of SCSI emulation or not. I have to take the next couple of weeks off so while I'm sitting around visiting family I'll come up with a quick reproducer for you since I'd like to see this fixed.

FWIW, my Mac is a Macbook Pro Core 2 duo, 2.33Ghz with 3GB of RAM. My Leopard install was an upgrade and not a fresh install, but it sounds like that doesn't matter. I didn't see this problem on Tiger, I used Fusion all day every day for a couple of months before Leopard and didn't see any real issues. Had Parallels not worked I would have went back Tiger.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

OK, I tried really hard for a while to get my Windows XP VM to hang, but I couldn't. It's happened to me in the past, happened to posters on this thread, and a friend of mine had it happened to him last week.

But I can easily replicate this hang using Ubuntu 7.x.

Ths is what I did:

  • Opened Ubuntu VM

  • sudo /usr/bin/vmware-toolbox

  • Started the Shrink procedure for my root drive (something that does lots of disk activity)

  • Started Time Machine in OS X

  • Before the Shrink procedure could finish "preparing" for the shrink, I got the freeze.

  • I captured several kernel dumps.

Attached are four stack traces, and txt files describing at what point I did them. Here are the four stages:

01 - as soon as the vm froze... time machine still running

02 - After time machine completed, but VMware Fusion front end still running (and not responding)

03 - After killing VMware Fusion front-end and vmware-vmx process

04 - Tried to shutdown, but kernel is hanging...

This attachment is 28k tgz, but I also have a 186m tgz that also contains the full kernel dumps. If you need that, maybe we can arrange a way to transmit?

And to anyone else, getting kernel dumps actually isn't that hard. Here is the page that you want to read:

http://developer.apple.com/technotes/tn2004/tn2118.html

Note that kdumpd is distributed with OSX, but it's a pretty trivial C program that runs from xinetd; if you download the source you can compile it without modification on Linux for instance. You'll have to go here:

http://www.opensource.apple.com/darwinsource/

to find the source for your kernel:

http://www.opensource.apple.com/darwinsource/10.5/

and then you'll want the network_cmds project, which will contain kdumpd source.

Also note that when you do kernel dumps, you follow the instructions in the above tech note to make Cmd-Power do a kernel dump. Note that for the config suggested on that page it will dump your kernel and then continue execution. That's how I was able to capture the kernel dump at 4 separate points.

Reply
0 Kudos
bgertzfield
Commander
Commander

Hi Jared,

This is excellent stuff. I deeply appreciate your hard work getting these kernel stack traces. I've attached them to the bug we're using to internally track this.

So, to recap, you are:

1) Using Time Machine

2) Switching between "Optimize for Mac OS application performance" and "Optimize for virtual machine disk performance" (which doesn't affect the issue)

Are you also using File Vault on this machine?

In each of your kernel stack traces, I see this:

task vm_map ipc_space #acts pid proc command

0x049db7f8 0x04357f78 0x03ab97e8 4 840 0x0437e4d0 diskimages-helpe

thread processor pri state wait_queue wait_event

0x04e395d0 0x005360c0 31 W 0x05224edc 0x5313e0

0x3434fb78 0x439e07 <IOUserClient::externalMethod(unsigned int, IOExternalMethodArguments, IOExternalMethodDispatch, OSObject, void)+837>

0x3434fbd8 0x4379b0

I would not at all be surprised if this were a bug in Apple's com.apple.driver.DiskImages driver.

Reply
0 Kudos
SeasideMan
Contributor
Contributor

I switched my network from bridgedd to NAT last night, then did a pretty large update and it didn't fail. It's just a data point, I haven't tried switching back and forth to see if it really makes a difference.

Reply
0 Kudos
sisaac
Contributor
Contributor

In my case, I'd turned off Time Machine for testing, and I've never used File Vault. I did a clean installation of Leopard, installed no non-Apple software other than Fusion, started to install CentOS in a VM, and it hung 97% of the way through the install.

The optimize for Mac OS app performance/optimize for VM performance setting hasn't made a difference for me.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

1) Yes, I am using Time Machine; that is I'd like to... but I have stopped for now Smiley Wink

2) I'm actually not switching back and forth on that setting, I'm staying on "Optimize for OS X". But others (as you know) have replicated this with the other setting.

I am not using FileVault, but now that you mention it, I think I can give you some more info:

  • I'm pretty sure I've made my VM hang without Time Machine going;

  • I can't remember making it happen without a USB drive plugged in, but I could be wrong about that.

  • Now I do remember that at one point this happened when I mounted a disk image to install some software I downloaded while a VM was busy. That locked it up. Also I now remember that last night when I replicated this I had the Kernel debugging kit image mounted. Maybe that's another prerequisite.

  • Probably unrelated, my iMac G5 had locked up the disk image stuff real good. The machine was super busy (probably Time Machine) and I had 4 or 5 disk images mounted, and tried to unmount one; then clicked again and again, and eventually it got to the point that no matter how long I waited I couldn't unmount any disk images, and I had to reboot. But, on my Intel MacBook Pro, when these hangs happen I've never been unable to unmount a disk image or a USB drive. Of course, that doesn't mean these issues aren't related.

Thanks!

Reply
0 Kudos
bgertzfield
Commander
Commander

Jared, all,

This is even more interesting. Do you all have disk images mounted when this hang occurs? Does the hang reproduce if you eject the disk image?

Reply
0 Kudos
bapper
Contributor
Contributor

Jared, all,

This is even more interesting. Do you all have disk images mounted when this hang occurs? Does the hang reproduce if you eject the disk image?

I had thought it might have been an issue having my USB drive connected with the VM's CDROM image on it when I saw the issue so I moved the things I had been running on it to the internal drive and I still saw the problem. So I don't think it is related, though it could exacerbate the issue.

Reply
0 Kudos
SeasideMan
Contributor
Contributor

No disk image mounted any time this occurred for me. I just did another update since I switched from bridged to NAT and this one worked, too.

Reply
0 Kudos
SeasideMan
Contributor
Contributor

It'll take a while for me to debug further as I'm working on getting a

release out the door before the end of the year.

I've also seen it on our MacBook Pro's, so it's not isolated to a single

machine. Let me know where I can FTP an image to you.

Reply
0 Kudos
cthree
Contributor
Contributor

I do have other images mounted besides internal drives, 1 FW800, 1 USB and an iDisk image. I haven't tried with the external drives unmounted.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

Since I captured those kernel dumps, I have taken great care to not plug in any USB device or mount any disk image while I have VMware Fusion running (I'd rather not rebuild my VM's again Smiley Wink Since then I have not had a case of VM freezing, and I'm using Ubuntu and Windows heavily. But it has only been a couple days...

I think that correlating this to a disk image makes a lot of sense in relation to FileVault--it's just a mounted encrypted disk image, right?

Anyhow, assuming this is a bug in OS X, maybe we can send the kernel dumps to Apple? Maybe someone on this board has access to and can check this against a 10.5.2 build (optimistically thinking they've fixed this)? I assume VMware has some sway with Apple to get bugs fixed...

Reply
0 Kudos
cdmckill
Contributor
Contributor

So any progress on this issue? Not having external drives mounted is not really an option for me. I would be happy for VMWare to validate that this is an Apple issue.

Reply
0 Kudos
bgertzfield
Commander
Commander

We've reproduced the bug and written a small test case that deadlocks Mac OS X 10.5.1 without VMware Fusion running.

We filed the bug with Apple as Radar bug 5679432.

So far, we've only reproduced it when the "Optimize for Mac OS application performance" preference is set. (For the technically inclined, this preference simply sets the F_NOCACHE setting to 1 on opened disk files.)

Folks, if you've run into hangs when you're not using the "Optimize for Mac OS application performance" setting, can you confirm:

1) If you're using Time Machine or third-party backup software

2) If you're using disk images (.dmg) on the host when this happens

Thanks,

Ben

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

Thanks! This is great...

Reply
0 Kudos
pjhinton
Contributor
Contributor

bgertzfeld writes :

We've reproduced the bug and written a small test case that deadlocks
Mac OS X 10.5.1 without VMware Fusion running.

+ We filed the bug with Apple as Radar bug 5679432.+
+ So far, we've only reproduced it when the "Optimize for Mac OS
application performance" preference is set. (For the technically
inclined, this preference simply sets the F_NOCACHE setting to 1 on
opened disk files.)+
+ Folks, if you've run into hangs when you're not using the "Optimize for
Mac OS application performance" setting, can you confirm:+
+ 1) If you're using Time Machine or third-party backup software+
+ 2) If you're using disk images (.dmg) on the host when this happens+

I would like to contribute another data point for this issue.

Host Hardware:

Model Identifier: MacBookPro3,1

Processor Name: Intel Core 2 Duo

Processor Speed: 2.2 GHz

Number Of Processors: 1

Total Number Of Cores: 2

L2 Cache: 4 MB

Memory: 2 GB

Host Hard Disk:

Capacity: 111.79 GB

Model: FUJITSU MHY2120BH

Revision: 0081000D

VMWare Fusion Version: 1.1 (62573)

Virtual Machine: CentOS Minimal Virtual Appliance, Version 1.1

Observed behavior: During operations with heavy disk access (svn copy, svn checkout are common culprits), VMWare Fusion becomes unresponsive (beachball). After several minutes, the virtual machine reverts to an off state. VMWare Fusion responds to menu commands. However, attempts to reboot the virtual machine results in an error message about not being able to acquire an exclusive lock.

The host computer will not shut down completely on its own. It has to be forcibly shut down by holding down the power button.

The log file for one such session is attached. Note that from 9:02 am onward, there are a large number of messages of the form:

vmx| SCSI0:0: Command WRITE(10) took x.xxx seconds (ok)

This coincides with the first of several Subversion command invocations.

To answer bgertzfeld's questions:

  • Performance on VMWare Preferences is set to "Optimize for virtual machine disk performance"

  • Time Machine is disabled on the host operating system.

  • No third-party backup software is in use on the host operating system.

  • No disk image (.dmg) files are mounted by the host operating system.

In addition, FileVault is not in use on the host operating system.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

bgertzfield, thanks a ton for your work on this problem. I see that Fusion 1.1.1 works around this by always using "Optimize for VM performance". In the past when using that option, my kernel_task would use much more memory because of the buffered IO; I might try using that option again with 1.1 to see how well it works on my current machine.

But I hope that when Apple fixes 5679432 there will be a 1.1.2 that will again allow "Optimize for Mac OS application performance". And I sure hope it got fixed in 10.5.2...

Reply
0 Kudos
bmatheny
Contributor
Contributor

I had a repro scenario under 10.5.1 using VMWare Fusion 1.1.1. I could consistently, with CentOS 5, beachball Fusion and cause it to become unresponsive when doing a svn checkout from our repository.

Tonight I upgraded to 10.5.2 and had high hopes that with 1.1.1 and 10.5.2 this issue would be resolved. Unfortunately the issue persists.

This makes software development in our environment nearly impossible. Does VMware have a fix on the horizon or do I need to stop using Fusion?

-Blake

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

Does anyone know if Apple Radar bug 5679432 got fixed in 10.5.2? Given the last comment, it appears not. Also, I have two Mac friends who both run VMware Fusion with Windows, and they have kernel panics and other issues with closing their Mac lids while VMware Fusion is running Windows XP full-screen. I know 1.1.1 has a workaround to avoid the Apple bug, but it might be manifesting itself in a different way on those machines.

Note that I haven't seen any beach ball freezes since I upgraded to 1.1.1 and 10.5.2.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

Thanks VMware and Apple for fixing this:

VMware Fusion 1.1.1 incorporated a workaround for Apple bug 5679432 (Mac OS X hang under heavy disk load when unbuffered I/O is in use). That version of Fusion disabled unbuffered disk I/O on Mac OS X 10.5 hosts, even if the user selected Optimize for Mac OS application performance in VMware Fusion preferences. Apple fixed this bug in Mac OS X 10.5.3, so VMware Fusion 1.1.3 removes the workaround when the guest operating system is Mac OS X 10.5.3 or higher.

Reply
0 Kudos