Re: Fusion kernel panic #5 and #6 - one panic ever...

NeilBradley · ‎08-31-2007

This is getting depressing fast. I already submitted a bug report on this two days ago, only one request for more info, my submission with plenty of info, then dead silence. ;-( Doesn't happen when Fusion isn't running. Not using DirectX8 emulation. Windows XP and FreeBSD guests. The XP guest has never seen a life outside of Fusion. Please, anyone, help me.

Fri Aug 31 22:21:50 2007

panic(cpu 1 caller 0x00191931): pmap_flush_tlbs() timeout pmap=0xcb5ecb0 cpus_to_signal=0x10

Backtrace, Format - Frame : Return Address (4 potential args on stack)

0x4d833b18 : 0x128d08 (0x3cc0a4 0x4d833b3c 0x131de5 0x0) 0x4d833b58 : 0x191931 (0x3cf2fc 0xcb5ecb0 0x10 0x190ad5)

0x4d833bc8 : 0x193c6b (0x5da3bc90 0x5da3bc98 0x0 0x19e23a)

0x4d833c18 : 0x16535c (0xcb5ecb0 0x71d92000 0x0 0x71d93000)

0x4d833d08 : 0x165b10 (0x71d93000 0x0 0x0 0x0)

0x4d833d48 : 0x1830e2 (0xcbc7b0c 0x71d92000 0x0 0x71d93000) 0x4d833d78 : 0x155f86 (0xcbc7b0c 0x71d92000 0x2a 0x11cc32)

0x4d833db8 : 0x12b4c3 (0xede41a4 0xc76dea8 0x0 0x0)

0x4d833df8 : 0x124b17 (0xede4100 0x0 0x28 0x4d833edc)

0x4d833f08 : 0x195a42 (0x4d833f44 0x0 0x0 0x0)

0x4d833fc8 : 0x19b32e (0xdfd8140 0x0 0x19e0b5 0xc2ad4c4) No mapping exists for frame pointer

Backtrace terminated-invalid frame pointer 0xbffff068

Kernel version:

Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386

Thu Aug 30 12:05:43 2007

panic(cpu 5 caller 0x00191931): pmap_flush_tlbs() timeout pmap=0x4b3700 cpus_to_signal=0x18

Backtrace, Format - Frame : Return Address (4 potential args on stack)

0x4d823b58 : 0x128d08 (0x3cc0a4 0x4d823b7c 0x131de5 0x0) 0x4d823b98 : 0x191931 (0x3cf2fc 0x4b3700 0x18 0x190ad5)

0x4d823c08 : 0x193c6b (0xfeb4a888 0xfeb4a8a8 0x0 0x19a986)

0x4d823c58 : 0x16535c (0x4b3700 0x69511000 0x0 0x69515000)

0x4d823d48 : 0x165b10 (0x69515000 0x0 0x1 0x0)

0x4d823d88 : 0x162e1f (0xf9bf3c 0x69511000 0x0 0x69515000) 0x4d823db8 : 0x353390 (0xf9bf3c 0x69511000 0x4000 0xc7b1518)

0x4d823dd8 : 0x353517 (0x1bd0cb8 0xc6ada10 0xe300950 0x0)

0x4d823e08 : 0x353563 (0xc72e1b8 0x283 0x4d823e58 0x140867)

0x4d823e38 : 0x332c97 (0xcad10cc 0xc72e000 0x4d823e98 0x198dfc)

0x4d823ec8 : 0x334470 (0xe300950 0xcad10cc 0xc72e000 0x1992a3)

0x4d823f18 : 0x33453a (0xc72e000 0x17 0xe300950 0x3)

0x4d823f58 : 0x37ad83 (0xc72e000 0xdf51780 0xdf517c4 0x0) 0x4d823fc8 : 0x19b28e (0xdf06920 0x0 0x19e0b5 0xdf06920) No mapping exists for frame pointer

Backtrace terminated-invalid frame pointer 0xbffff338

Kernel version:

Darwin Kernel Version 8.10.1: Wed May 23 16:33:00 PDT 2007; root:xnu-792.22.5~1/RELEASE_I386

admin · ‎09-01-2007

The thing is, backtraces really aren't useful to the average person (this includes me); you need builds with symbols to decode them. Only the devs can say what's going on, and it's a long weekend, so I wouldn't count on a reply before Tuesday. The last three or four lines are the same across all your panics, so that suggests whatever's going on, it's the same issue.

ksc · ‎09-01-2007

Kernel panics like this (ones in the kernel itself, not in VMware code) are painfully hard to debug without reproducing it on a system at VMware. (And, from the specs you gave in your original post, 8-CPU 9GB Mac machines aren't all that common!).

Looking at the error and the backtrace involved, it looks like one CPU is doing a normal cleanup which requires communication with the other CPUs, and that cross-cpu communication times out. This tells me two things: (1) it's a race condition, depending on the random scheduling of two of your eight CPUs - and thus extremely hard to reproduce, and (2) the useful information that would tell us what code is misbehaving (and not answering the communication fast enough) is on a different CPU which is not contained in this backtrace. (Nor is there a way for you to get at the information).

(The backtrace is almost the same as this one: http://lists.apple.com/archives/ata-scsi-dev/2006/Dec/msg00000.html - which basically says some code somewhere else is hanging on to a CPU for too long.)

We'll have to hope we can reproduce this in-house. The only workaround I can suggest is turning off a bunch of the CPU cores; I know that's not realistic in your case. Your machine is just too powerful .

NeilBradley · ‎09-02-2007

Doesn't VMWare install some sort of hypervisor down at the OS level? Doesn't it have the capability of adversely affecting the OS? While I realize there are kernel panics that look similar to one another, the fact that I have never seen a kernel panic without VMWare running, but have with Fusion running, leads me to believe it's Fusion related. I have provided

I guess in the mean time I could do something like set the # of CPU cores on the XP VM to 1 instead of 2 (I've already shut off DirectX). Should that help at all?

So far, knock on wood, I'm going on almost 2 days with no crashes with both VMs running. Perhaps I should be quiet now lest I jinx it.

HPReg · ‎09-17-2007

Neil,

This panic happens when the Mac OS kernel wants to do a tlb shootdown: one CPU sends IPIs to all other CPUs. If at least one CPU times out before ack'ing the IPI, the kernel panics. The timeout is usually very high (in the order of 1 s. IIRC).

What it means in practice is that somebody disabled the interrupts on a CPU for more than 1 s...

Now the first few days we wrote our vmmon kext, we hit this bug: we were doing something really stupid, like sleeping 1 s. with interrupts disabled. We fixed it and we have never seen the problem occur again. Ever.

In any case, it would help us tremendously if you could narrow down the cause of the issue. For example:

o Does it happen randomly while a VM is running, or just when you start (or stop) a VM?

o What is the minimal number of host CPUs that you need to reproduce the problem? (try disabling them one by one). I can guarantee you need at least 2 CPUs to hit the problem, but who knows, maybe in your setup it only starts happening when you enable the 4th CPU.

o Does it happen even when you boot with 1 GB or 2 GB of RAM? If yes, it means we might have more chances to reproduce it in-house.

All

Fusion kernel panic #5 and #6 - one panic every ~12-16 hours