VMware Communities
cdmckill
Contributor
Contributor

Hangs & Disk Corruption using Ubuntu 7.10 w/Fusion 1.1 + 10.5.1

I have been dealing with this for the past couple of weeks. Turns out that several people at work have also been seeing it with the same configuration.

Leopard 10.5.1

VMWare 1.1

Ubuntu 7.10

As soon as there is any heavy amount of Disk I/O under Linux (doing a compile, or large svn checkout), the VM hangs. The process under MacOS is unkillable, and I have to force shutdown my Mac Book Pro (it will never reboot on it's own, sits spinning forever). When I get the Linux system back up and running the entire filesystem is corrupted, most times beyond repair.

We have tried using/not-using VMWare Tools -- doesn't have any effect. No other odd settings are performed beyond the Create New Virtual Machine wizard.

Is this a known issue to VMWare? None of us had any problems with 1.0 on Tiger, so I suspect it is either 1.1 or Leopard that is giving us the troubles.

thanks,

chris

Reply
0 Kudos
60 Replies
cdmckill
Contributor
Contributor

I just tried again tonight -- both settings caused the hang up.

chris

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

Thanks!

I've been running with the setting "Optimize for Mac OS application performance" since I've started using Fusion.

Reply
0 Kudos
SeasideMan
Contributor
Contributor

Nope, I'm using "Optimize for virtual machine disk performance". When I'm running VMware, I only care about that app. I'm not running anything on my Mac concurrently except a terminal session to ssh to the VMware Xubuntu guest OS. I was trying to just walk away from the machine while it was loading Ubuntu updates in the hopes that not using the mouse and keyboard might help avoid the problem. It hasn't.

Reply
0 Kudos
bgertzfield
Commander
Commander

I just tried again tonight -- both settings caused the hang up.

Hi Chris,

What kind of hardware are you running this on? Can you describe the exact steps you took to reproduce the hang?

Also, did you shut down your virtual machines before changing the setting? (The setting doesn't take effect until you restart your VM.) We're trying to reproduce it in-house now, and we want to pinpoint the circumstances that cause this issue.

Reply
0 Kudos
SeasideMan
Contributor
Contributor

Here's a simple way to reprodudce this on a standard Intel based 24" iMac with 2GB RAM.

Install a 64-bit Ubuntu 7.10 Desktop edition from the CD. Run the update manager, which should tell you it has a large number of updates. i.e. >50. Wait for the updates to download, and then it should hang when it tries to unpack the updates.

I'm not on my home machine right now, so I can't guarantee the 7.10 installer will need the updates. It's possible the Ubuntu team keeps the installer up to date. I could send you a zipped VMware image tonight but it will take me a while to strip out all the extras, and it's a 1.3 GB file, so I'd need an FTP site to upload it.

Reply
0 Kudos
bgertzfield
Commander
Commander

Here's a simple way to reprodudce this on a standard Intel based 24" iMac with 2GB RAM.

Install a 64-bit Ubuntu 7.10 Desktop edition from the CD. Run the update manager, which should tell you it has a large number of updates. i.e. >50. Wait for the updates to download, and then it should hang when it tries to unpack the updates.

As you might guess, we do this test quite often, and it doesn't hang in our testing. There's something different about your setup (different preferences, third-party software installed, extra hardware) that's tickling this issue.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

I just got my latest support request (199767001) response this morning, informing me that Ubuntu 7.10 is not a support OS.

I responded that this issue exists on Windows as well, and even if it didn't, this indicates a serious problem somewhere deep in the disk emulation (at least that's my guess).

If there's a problem reproducing this at VMware, is it possible I can help by using gdb on my machine? Or should I run some sort of kernel debugger? Or get kernel logging messages that might be useful? Let me know... I have experience with Linux and Windows kernel drivers, so I can find my way around a debugger.

Reply
0 Kudos
cdmckill
Contributor
Contributor

I just tried again tonight -- both settings caused the hang up.

Hi Chris,

What kind of hardware are you running this on? Can you describe the exact steps you took to reproduce the hang?

Also, did you shut down your virtual machines before changing the setting? (The setting doesn't take effect until you restart your VM.) We're trying to reproduce it in-house now, and we want to pinpoint the circumstances that cause this issue.

I am using a 2.4Ghz MacBook Pro, with 4G of RAM installed. I am running a clean install of 10.5 with the 10.5.1 update applied. No other hardware drivers or system services have been installed. The reason this is a clean install is that I wanted to make sure there wasn't anything funny with my original 10.5 install that could be causing VMWare grief.

One way I was able to make this happen with a lot faster is to up the amount of RAM for the Guest to 1G and enable multiple CPUs for the virtual machine and also to make sure that the VMWare Tools are installed. If I keep it to a single CPU, 512M RAM and no install of the VMWare Tools it takes a lot longer to reproduce.

chris

Reply
0 Kudos
cthree
Contributor
Contributor

Do it with CentOS 5. RHEL 5 IS a supported guest OS and you'll get the same result. Pretty lame of them to blow you off like that.

Reply
0 Kudos
rcardona2k
Immortal
Immortal

In minor defense of Support the number of "one off" cases to support even it's a configuration used by 90+% of the community is combinatorially large. However, Apple are the masters of this when it comes to their RAM parts, hard disk, parts, etc. I know it's a bad analogy of hardware v. software.

Look at the bonus/challenge this way: if you can reproduce the problem at the center of sweet-spot of the Supported matrix, then that's irrefutable evidence that a vendor MUST fix the problem. Anything else is frankly speculation and uncertainty.

Shower VMware with facts backed by supported configurations and they will deliver.

Reply
0 Kudos
cthree
Contributor
Contributor

huh?

Reply
0 Kudos
rcardona2k
Immortal
Immortal

Do it with CentOS 5. RHEL 5 IS a supported guest OS and you'll get the same result.

Good

Asking for support with Ubuntu 7.10

Bad

Pretty lame of them to blow you off like that.

Ugly




Opinions.

Reply
0 Kudos
cthree
Contributor
Contributor

It is a blow off. Saying Ubuntu 7.10 is unsupported doesn't fix the problem or help the customer. All it does is churn the problem through the help desk. If he used RHEL 5.1 he would have the same problem as documented above. The problem remains and a number of customers are left with a product which doesn't do what its supposed to.

The problem is Linux not Ubuntu. I've been asking about this for almost 2 months, ever since Leopard was released, and the threads keep getting merged, fainting a desire to gather information and resolve it.

Reply
0 Kudos
bgertzfield
Commander
Commander

It is absolutely a general disk backend issue, and it's unrelated to the guest. We're trying to reproduce it and figure out what the bug is. Unfortunately, there seems to be three or four different reproduction scenarios here, none of which are 100% repro cases.

The support folks really don't have the ability to make the call whether a particular bug is or is not guest-specific; they can't tell the difference between someone running a totally unsupported guest like OS/2, and someone running an almost-supported guest like the latest Ubuntu or Fedora versions. I know that isn't a really satisfying answer, but that's probably why the support team was unable to directly answer your question.

Reply
0 Kudos
cthree
Contributor
Contributor

Thank you!

That is great news. Since I seem to be able to reproduce this problem with ease I am more than willing to work with you to get the information you need to identify the conditions of the crash and hopefully reproduce it yourselves in your lab. Feel free to email me with any instructions to gather the info you need should you need it.

Clear and straightforward communications are good.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

I agree with bgertzfield. The tech support guys are doing the best they can to insulate the developers from the thousands of support requests that would keep them from getting anything done. I also appreciate his help in getting to the bottom of this.

As for collecting info, would it help to capture a kernel dump when this happens next? I'm reading the Apple technotes now on setting up kdumpd and enabling NMI when you hit Cmd-Power so the dump can be generated. Just wondering if bgertzfield thinks this will produce information that is useful; when VMware Fusion hangs, the rest of the machine still works--except it won't shut down. I believe that getting such a kernel dump might show where the Fusion kernel module was stuck, but I don't know if that's the case.

Reply
0 Kudos
bgertzfield
Commander
Commander

Thanks, Jared.



It would definitely help to capture a list of kernel stacks when this happens. If you can attach a kernel debugger and use the 'showallstacks' function from the 'kgmacros' script in the Kernel Debug Kit, that'll give us exactly the information we need.

Reply
0 Kudos
bapper
Contributor
Contributor

It is absolutely a general disk backend issue, and it's unrelated to the guest. We're trying to reproduce it and figure out what the bug is. Unfortunately, there seems to be three or four different reproduction scenarios here, none of which are 100% repro cases.

I've switched back to Parallels after hitting this problem because I no longer need x86_64 support right now and Parallels doesn't crash on me, but I'm sure I will need it in our next round of development so I'll need this fixed (or Parallels with x86_64 support) by then. If you're interested, I'll look into creating a small (8MB or so) Linux VM and program or script that will reproduce this. I see it when compiling since I'm a cross compiling maniac, but like you said, I'm certain it is disk activity that triggers it.

I've not done any OS X kernel work before but this might be a good opportunity to cut teeth on at least the tools available to get a stack trace, too. Smiley Happy

Reply
0 Kudos
bgertzfield
Commander
Commander

I've switched back to Parallels after hitting this problem because I no longer need x86_64 support right now and Parallels doesn't crash on me, but I'm sure I will need it in our next round of development so I'll need this fixed (or Parallels with x86_64 support) by then. If you're interested, I'll look into creating a small (8MB or so) Linux VM and program or script that will reproduce this. I see it when compiling since I'm a cross compiling maniac, but like you said, I'm certain it is disk activity that triggers it.

I've not done any OS X kernel work before but this might be a good opportunity to cut teeth on at least the tools available to get a stack trace, too. Smiley Happy

Hi bapper,

Sorry to hear that you also ran into the problem. Just one question: did you also have the "Optimize for Mac OS application performance" option selected in the VMware Fusion Preferences menu?

Reply
0 Kudos
SeasideMan
Contributor
Contributor

I can send you a complete Ubuntu Linux with VMware tools and a few applications installed along with a set of imstructions to reproduce it. I's about 1.2GB though.

Sorry I can't be of much use in debugging.

Reply
0 Kudos