VMware Communities
rich_becks
Contributor
Contributor

Fusion 1.1RC1 "beachballs" under heavy VM load (after switch to leopard)

I have recently upgraded to leopard. Before I had successfully used a gentoo linux system to cross-compile an arm platform using Fusion 1.1RC1. Ever since moving to leopard the VMware fusion front-end locks up with a beachball during the kernel cross-compile. I had already disable time-machine from accessing the "Virtual Machines" directory (though time machine is enabled). so it's probably not that. Looking at the processes the vmware-vmx process is sat at around 6% cpu, almost as if it is still running quite happily. However the VMWare fusion front-end does not even respond to "Force Quit", I have to kill it from Activity Monitor.

The worse thing about the beachball situation is that I cannot recover to full operation without restarting the platform, and I can't even do that without using the CtrlApplePower force sync combination. Needless to say this does not leave the virtual machine in a very good state (lots of dropped/duplicate inodes upon force check after reboot).

I am going to revert to 1.0 to see if things are better there.

Nothing of note in the full debug on logs:

Nov 06 17:33:24.469: vmx| VMMEM 0 1 74418 391617 391617 757760 -1 50

Nov 06 17:33:24.469: vmx| VMMEM checked 154433 bad 0 badKey 0 badMPN 0 badCOW 0

Nov 06 17:33:24.469: vmx| COWStats: numHints 61129 unique 17 shared 778 totalUnique 909 totalBreaks 20579

Nov 06 17:33:24.469: vmx| COWStats Hot Page: hash 0x559cc83f478f1bfc count 7

Nov 06 17:33:24.469: vmx| COWStats Hot Page: hash 0xbf7142bb49e776b0 count 22

Nov 06 17:33:24.469: vmx| COWStats Hot Page: hash 0xc7d48d444fc02b87 count 23

Nov 06 17:33:24.469: vmx| COWStats Hot Page: hash 0x78b8948d7412006c count 28

Nov 06 17:33:24.469: vmx| COWStats Hot Page: hash 0x2bd8e6157e6af6d6 count 671

Nov 06 17:33:24.469: vmx| VMMEM VM 0 min 59099 max 91867 share 65536 paged 73537 nonpaged 18330 locked 74401 cowed 778 usedPct 39

Nov 06 17:33:49.368: vmx| LOADAVG: 1.23 1.26 0.93

Nov 06 17:34:19.368: vmx| LOADAVG: 1.06 1.22 0.92

Using leopard's process monitor sample The following thread activity is revealed (maybe this can be related to the build even without symbols):

Sampling process 146 for 3 seconds with 1 millisecond of run time between samples

Sampling completed, processing symbols...

Analysis of sampling vmware (pid 146) every 1 millisecond

Call graph:

1615 Thread_2503

1615 0x2f75

1615 0x305a

1615 0x6f5bf

1615 NSApplicationMain

1615 -[NSApplication run]

1615 -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:]

1615 _DPSNextEvent

1615 BlockUntilNextEventMatchingListInMode

1615 ReceiveNextEventCommon

1615 RunCurrentEventLoopInMode

1615 CFRunLoopRunInMode

1615 CFRunLoopRunSpecific

1615 __CFSocketPerformV0

1615 __CFSocketDoCallback

1615 0x3a5954

1615 0x3a58f1

1615 0xcb369

1615 0xb6957

1615 0xb6837

1615 0x1109dc

1615 0xf356b

1615 0x750ee

1615 0x7c46e

1615 0x7ffbd

1615 0x88611

1615 0x87ba7

1615 0x8387e

1615 0x937ce

1615 0x920b7

1615 0x9b012

1615 0x9ab06

1615 0x3a8993

1615 readv

1615 readv

1615 Thread_2603

1615 thread_start

1615 pthreadstart

1615 select$DARWIN_EXTSN

1615 select$DARWIN_EXTSN

1615 Thread_2703

1615 thread_start

1615 pthreadstart

1615 __NSThread__main__

1615 -[NSThread main]

1615 -[NSUIHeartBeat _heartBeatThread:]

1615 -[NSConditionLock lockWhenCondition:]

1615 -[NSConditionLock lockWhenCondition:beforeDate:]

1615 -[NSCondition waitUntilDate:]

1615 pthread_cond_timedwait_relative_np

1615 pthreadcond_wait

1615 semaphore_timedwait_signal_trap

1615 semaphore_timedwait_signal_trap

1615 Thread_2803

1615 thread_start

1615 pthreadstart

1615 glvmDoWork

1615 pthread_cond_wait$UNIX2003

1615 __semwait_signal

1615 __semwait_signal

1615 Thread_2903

1615 thread_start

1615 pthreadstart

1615 CMMConvTask(void*)

1615 pthreadSemaphoreWait(t_pthreadSemaphore*)

1615 pthread_cond_wait$UNIX2003

1615 __semwait_signal

1615 __semwait_signal

Finally, my .vmx file:

config.version = "8"

virtualHW.version = "6"

numvcpus = "2"

scsi0.present = "TRUE"

scsi0.virtualDev = "lsilogic"

memsize = "256"

MemAllowAutoScaleDown = "FALSE"

scsi0:0.present = "TRUE"

scsi0:0.fileName = "Gentoo Split.vmdk"

ide1:0.present = "FALSE"

ide1:0.fileName = "/Users/rjones/Documents/ISO Images/install-x86-minimal-2006.1.iso"

ide1:0.deviceType = "cdrom-image"

floppy0.present = "FALSE"

ethernet0.present = "TRUE"

ethernet0.connectionType = "bridged"

ethernet0.wakeOnPcktRcv = "FALSE"

usb.present = "FALSE"

ehci.present = "TRUE"

sound.present = "FALSE"

sound.fileName = "-1"

sound.autodetect = "FALSE"

pciBridge0.present = "TRUE"

isolation.tools.hgfs.disable = "FALSE"

displayName = "Gentoo"

guestOS = "other26xlinux"

nvram = "Gentoo.nvram"

deploymentPlatform = "windows"

virtualHW.productCompatibility = "hosted"

RemoteDisplay.vnc.port = "0"

tools.upgrade.policy = "useGlobal"

powerType.powerOff = "soft"

powerType.powerOn = "soft"

powerType.suspend = "hard"

powerType.reset = "soft"

vmi.present = "TRUE"

#vmi.enabled = "TRUE"

vmi.pciSlotNumber = "33"

ethernet0.addressType = "generated"

uuid.location = "56 4d b1 29 51 d7 bc 85-7d f3 53 93 53 ba 81 bf"

uuid.bios = "56 4d bc ac e7 bc db 84-d0 b2 af b3 73 a5 0a a1"

scsi0:0.redo = ""

pciBridge0.pciSlotNumber = "17"

scsi0.pciSlotNumber = "16"

ethernet0.pciSlotNumber = "32"

sound.pciSlotNumber = "-1"

ehci.pciSlotNumber = "35"

ethernet0.generatedAddress = "00:0c:29:a5:0a:a1"

ethernet0.generatedAddressOffset = "0"

tools.remindInstall = "FALSE"

checkpoint.vmState = ""

ide1:0.startConnected = "FALSE"

tools.syncTime = "TRUE"

sound.startConnected = "FALSE"

gui.exitOnCLIHLT = "TRUE"

chipset.useAcpiBattery = "TRUE"

chipset.useApmBattery = "TRUE"

mks.keyboardFilter = "allow"

sharedFolder.option = "onetimeEnabled"

ethernet1.present = "TRUE"

ethernet1.connectionType = "nat"

ethernet1.wakeOnPcktRcv = "FALSE"

ethernet1.addressType = "generated"

ethernet1.pciSlotNumber = "34"

ethernet1.generatedAddress = "00:0c:29:a5:0a:ab"

ethernet1.generatedAddressOffset = "10"

ide0:0.present = "TRUE"

ide0:0.fileName = "/Library/Application Support/VMware Fusion/isoimages/linux.iso"

ide0:0.deviceType = "cdrom-image"

ide0:0.startConnected = "TRUE"

ethernet0.startConnected = "TRUE"

extendedConfigFile = "Gentoo.vmxf"

usb.generic.autoconnect = "FALSE"

Reply
0 Kudos
18 Replies
rich_becks
Contributor
Contributor

Further up to this. I can confirm after un-winding back to Fusion 1.0 that I no longer suffer from this problem.

The beachball lockup also totally trashed my XP SP2 system VMware image, the disk was so corrupted that XP could not read the default registry hive and hence could not boot.

1.1RC1 users beware!

Reply
0 Kudos
rich_becks
Contributor
Contributor

1.1 Full release also "beachballs" under heavy VM load. I can't even force-quite Fusion from the dock!

Reply
0 Kudos
andy_boyett
Contributor
Contributor

As much as I hate "me too" responses. I'm experiencing the same issue with 1.1 final (62573) and can easily reproduce it.

The Fusion icon remains in the dock (as if the program were running,) regardless of what processes I kill, (VMWare Fusion, vmnet-natd, vmnet-netifup, vmware-dhcpd, vmnet-bridge, vmware-vmx.)

The guest OS I'm seeing this with is a 64-bit Ubuntu 7.10 image. I do have Time Machine enabled but have the Virtual Machines directory excluded. Going to first try disabling Time Machine and then try VMWare Fusion 1.0.

Reply
0 Kudos
andy_boyett
Contributor
Contributor

1.1 Final works perfectly with Time Machine disabled. I have not been able to replicate the problem.

Reply
0 Kudos
rcardona2k
Immortal
Immortal

It's possible the "extra load" Time Machine places on OS X in general is causing problems. Maybe there are other applications you need to exclude data for like Final Cut or Aperture?

Reply
0 Kudos
GabrielM
Contributor
Contributor

Same problem here ever since upgrading to 1.1 and Leopard... and I'm not using Time Machine.

I need to kill all the processes manually and restart Vmware. It has happened at least 5 times in the past month.

Reply
0 Kudos
rich_becks
Contributor
Contributor

I can confirm that it doesn't matter whether time machine is enabled or not. The only fix I have is to revert back to 1.0 release, and this doesn't cause the problem even with leopard. Something has changed between 1.0 and 1.1rc1 onwards to cause this.

Update (after reading "Colour-wheel of death" thread):

I'm not using file vault

time machine makes no difference

and to re-iterate 1.0 release does not beach-ball on same setup.

Richard J.

Reply
0 Kudos
HPReg
VMware Employee
VMware Employee

Does it happen with all your VMs, or just specific ones?

Does it happen every time you use a VM with Fusion 1.1 or sometimes?

Reply
0 Kudos
rich_becks
Contributor
Contributor

I only have two VMs I use regularly. It has been happening on my gentoo based Linux 2.6 distro. It happens when I perform a large source build; in other words where the disk load of the guest is quite high.

I have not tried to see if my other VM Windows XP pro suffers from the same problem. I guess I would need to find an app that aggresively uses the disk over a large period of time, perhaps a benchmarking app?

I've had it happen pretty much every day with the Gentoo VM usually at the peak of a large cross-platform build. All 1.1 candidates are affected. 1.0 is not.

It's a pretty serious deadlock condition because the vmware-vmx process will not die even with kill -9!. Under ps it ends in state "Es" which means the process is trying to exit and is a session leader. The vmware process (the GUI) is also locked which just state "E". Once you get into this state then there is nothing you can do but shutdown, but even this fails as the processes refuse to die. You end up having to do a Ctrl+Apple power to sync the disks, followed by a hold down the power to power down.

I'm more than happy to provide further diagnostics as requested (Xcode tools, further process info anything that you might need to catch the issue).

Reply
0 Kudos
andy_boyett
Contributor
Contributor

Unfortunately I can now confirm rich' experience, completely. With Time Machine disabled it appears to be less likely, but still does occur. Once the VM crashes, the various VMWare processes match the state described by rich, and the machine cannot be cleanly rebooted or shutdown.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

I saw this 3 times in the last day or so, each time forcing me to reinstall my Linux distribution. I was running Ubuntu 7.04, now I'm running 7.10. I also have Fusion 1.1, and like the above posters, I think it's probably because I've been using Time Machine more often, and running big builds in my Linux VM.

I've filed a support request (#199767001) with VMware, and tried to supply them with all the information I can. I hope that helps.

Because of my problems with this, I'm going to start using Shared Folders with my Linux guest. I think that will also help prevent the issue, as I suspect it's a bug in the VMware kernel module that emulates the disk. In addition, if my VM is corrupted again, all my files that I care about are in my Mac file system. Also, once I get the courage to use Time Machine, this allows me to back-up my important files; using Time Machine to back-up the entire VM disk image isn't practical.

By the way, see this post about using Shared Folders from Linux. It seems to work really well, but you probably have to make a mod to your /etc/fstab:

http://communities.vmware.com/thread/99000

Reply
0 Kudos
mykmelez
Enthusiast
Enthusiast

Because of my problems with this, I'm going to start using Shared Folders with my Linux guest. I think that will also help prevent the issue, as I suspect it's a bug in the VMware kernel module that emulates the disk.

That would explain why I haven't been experiencing the problem. My 300MB of Mozilla source code, which I compile regularly, lives on my Mac OS X host OS partition, from where it is accessed by my Linux and Windows VMs via Fusion shared folders when I compile Firefox.

By the way, see this post about using Shared Folders from Linux. It seems to work really well, but you probably have to make a mod to your /etc/fstab:

http://communities.vmware.com/thread/99000

Yup, that works great for me. But note that in most cases only need to hack /etc/fstab if you want write access to the files in your VM.

Note that the same problem is being discussed in this other thread:

http://communities.vmware.com/thread/115466

Reply
0 Kudos
andy_boyett
Contributor
Contributor

Unfortunately I just experienced the same problem, under 1.0. This is making Fusion unusable.

Reply
0 Kudos
royewest
Contributor
Contributor

Well, this just bit me, with scsi write errors in the vmware log. I thought by deleting the vm lock files and running chkdsk I could recover, but no such luck -- Outlook wanted to start my expeience as a new user when I started it, so who knows what other disk corruption is going on.

Fortunately I'd used Carbon Copy Cloner to back up my Mac OS X boot drive about 36 hours ago, but what a nightmare.

I hesitate to file a support request because each time I do, I never get a resolution, just increasingly time-consuming tasks to perform for VMWare. If that's a strategy to make me go away, it's working.

What is the prognosis for a fix to this, VMWare?

Reply
0 Kudos
royewest
Contributor
Contributor

I suppose I should have mentioned that this was while running my only vm, Win XPsp2, under very light load, with all current patches to XP and to OS X. I've had other problems with VMWare in recent months with Leopard and Fusion 1.1 (Build 62573), mostly having to do with its habit of destroying USB FAT drives when I accidentally let XP see them under VMWare, but this is the first time I've had my VM so corrupted I had to restore the entire thing from backups.

Reply
0 Kudos
jared_oberhaus
Contributor
Contributor

I just got a response from Technical Support regarding my support request. I got a second level of "workaround" requests, such as uninstall/reinstall VMware tools, and increasing virtual memory. I replied that I've tried all those.

I just hope that this issue is already been addressed by an engineer and that my support request can provide additional technical details that will help.

Reply
0 Kudos
Rootus
Contributor
Contributor

I hope VMware is working hard on a solution for this, it's really killing me. I've had to hard reboot my macbook three times this morning. Most often for me it happens when I close the VM, though the third time this morning it happened after the screensaver activated.

I guess this weekend I'm going to repave the macbook with Tiger. It hurts to do that, but this is just one of a few nags I've had since installing Leopard and it's murdering my productivity. I can't continue to waste time on this. I may try switching back to Parallels until VMware announces a fix.

Reply
0 Kudos
rcardona2k
Immortal
Immortal

I'm not trying to scare you off Parallels but I can't run that #@! software anymore after going to Leopard (OS X 10.5.1). Within a few minutes my machine kernel panics while accessing the Internet. I sent a dozen reports like this to them with no response on Parallels 3.0 release build 5582. I've killed their kexts and I just don't run their software.

Your mileage may vary.

Problem report - Start

Please send this file to Parallels Software International, Inc.

-


Users bug description -


Backtrace terminated-invalid frame pointer 0

Kernel loadable modules in backtrace (with dependencies):

com.parallels.kext.Pvsnet(3.0)@0x4351a000->0x4351ffff

BSD process name corresponding to current thread: kernel_task

Mac OS version: 9B18

Kernel version:

Darwin Kernel Version 9.1.0: Wed Oct 31 17:46:22 PDT 2007; root:xnu-1228.0.2~1/RELEASE_I386

System model name: MacBookPro1,1

Reply
0 Kudos