Running Fusion 3.0.1 on a 32 bit booted Macbook Pro 10.6.2, my 10.6 guest now randomly panics in various processes (servermgrd, for example). This didn't happen with 10.6.1.
Am I alone? It's a throwaway VM... did I get unlucky?
panic.log? Are you using multiple vCPUs (if so, try dropping down to 1 - SMP OS X guests are very finicky).
Just one cpu. and no panic.log being generated, for reasons I can't ascertain. (you mean PanicReporter log, yes? panic.log died in 10.4?)
I enabled two-kernel debugging and I'll get a backtrace from gdb.
no panic.log being generated, for reasons I can't ascertain. (you mean PanicReporter log, yes? panic.log died in 10.4?)
Yes, whatever it's being called these days (I thought panic.log was through 10.5, and PanicReporter + date sounds right for 10.6). The file with the backtrace thingamajig ![]()
I enabled two-kernel debugging and I'll get a backtrace from gdb.
That's probably good too.
(gdb) where
#0 Debugger (message=0x5cf868 "panic") at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/AT386/model_dep.c:866
#1 0x0021b2bd in panic (str=0x59062c "\"Spinlock acquisition timed out: lock=%p, lock owner thread=0x%lx, current_thread: %p\"@/SourceCache/xnu/xnu-1486.2.11/osfmk/i386/locks_i386.c:346") at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/debug.c:303
#2 0x0029c556 in usimple_lock (l=0x869f20) at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/locks_i386.c:346
#3 0x002312d7 in timer_call_enter1 (call=0x84b804, param1=0x3cb23d4, deadline=12832588820473) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/timer_call.c:185
#4 0x0022678f in thread_dispatch (thread=0x3b7eb7c, self=0x3cb23d4) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:1777
#5 0x002269cf in thread_invoke (self=0x3b7eb7c, thread=0x3cb23d4, reason=0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:1539
#6 0x00226f27 in thread_block_reason (continuation=0x548626 <IOWorkLoop::threadMain()>, parameter=0x3d55800, reason=<value temporarily unavailable, due to optimizations>) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:1850
#7 0x00226f92 in thread_block_parameter (continuation=0x548626 <IOWorkLoop::threadMain()>, parameter=0x3d55800) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:1875
#8 0x005486e2 in IOWorkLoop::threadMain (this=0x3d55800) at /SourceCache/xnu/xnu-1486.2.11/iokit/Kernel/IOWorkLoop.cpp:353
All seem to involve spinlock acquisition timeout, as if there's a clock issue.
VMware Tools was installed, however, trying to install it again just in case.
This time, fseventsd, and I can confirm VMware Tools was up to date.
#0 Debugger (message=0x5cf868 "panic") at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/AT386/model_dep.c:866
#1 0x0021b2bd in panic (str=0x59062c "\"Spinlock acquisition timed out: lock=%p, lock owner thread=0x%lx, current_thread: %p\"@/SourceCache/xnu/xnu-1486.2.11/osfmk/i386/locks_i386.c:346") at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/debug.c:303
#2 0x0029c556 in usimple_lock (l=0x847e3c) at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/locks_i386.c:346
#3 0x0022505e in thread_setrun (thread=0x4b4d3d4, options=5) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:2471
#4 0x00225b93 in thread_go (thread=0x4b4d3d4, wresult=1) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:519
#5 0x00227222 in thread_timer_expire (p0=0x4b4d3d4, p1=0x0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/sched_prim.c:337
#6 0x00231049 in timer_queue_expire (queue=0x81e04c, deadline=39677196370918) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/timer_call.c:286
#7 0x0029583e in etimer_intr (inuser=0, iaddr=57) at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/etimer.c:95
#8 0x002a661c in rtclock_intr (tregs=0x2b0fb9a0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/rtclock.c:542
#9 0x002ae599 in lapic_interrupt (interrupt_num=, ub=4302858416, xsecurity=0, xsecurity_size=0, isstat64=1) at /SourceCache/xnu/xnu-1486.2.11/bsd/vfs/vfs_syscalls.c:4079
#17 0x002e5beb in lstat64 (p=0x4a14a80, uap=0x5f67ba8, retval=0x5e1f1b4) at /SourceCache/xnu/xnu-1486.2.11/bsd/vfs/vfs_syscalls.c:4113
#18 0x004ee947 in unix_syscall64 (state=0x5f67ba4) at /SourceCache/xnu/xnu-1486.2.11/bsd/dev/i386/systemcalls.c:433
We haven't tested 10.6.2 in a Fusion 3 VM much, as 10.6.2 hadn't been released at the time we released Fusion 3.0.1.
Thanks for providing the panic logs, that is very helpful.
There is only one piece of code which uses the string "Spinlock acquisition timed out:" in the kernel source code (xnu), and it is present in both the 10.6.1 and the 10.6.2 kernel, so the fact that you are starting to see this seems unrelated to your 10.6.1 -> 10.6.2 upgrade.
To workaround this issue, I recommend you add the following option to your kernel boot arguments:
slto_us=0xffffffff
which will effectively bump the spinlock acquisition timeout to about 4300 seconds.
Then run for a while and let us know what you observe:
1) If the VM seems to run fine, then the issue is related to virtualization, but it is impossible to fix by design. We need to work with Apple so they bump their timeout when they detect that they are virtualized.
2) If the VM sometimes now hangs (for 4300 seconds!) instead of panicing, then this is a bug in the Apple kernel and you should report it to them.
Now that being said, since you said your VM only has one virtual CPU, then I don't think it can possibly be #1. I think it has to be #2, i.e. I think there is a buggy code path in the Apple kernel which attempts to acquire a spinlock (apparently on the interrupt path) which is already acquired (probably by a non-interrupt path). So I think you should let Apple's Panic Reporter report the issue to Apple engineers, so they can work on it.
PanicReporter doesn't offer, however, I've used bugreport.apple.com enough times to submit incidents before that I know the drill.
I'm just surprised it's not manifesting anywhere else, including the host.
PanicReporter doesn't offer
It does not pop-up when you restart after the panic, or it pops-up but does not offer to send the data to Apple?
I'm just surprised it's not manifesting anywhere else, including the host.
Thread scheduling in the guest is very different in a VM than it is on the host. That is because unlike physical CPUs, virtual CPUs can be descheduled on the host.
As a result, VMs often exacerbate race conditions in kernels which are real bugs, but which hit much less often on physical machines.
I didn't change the spinlock setting yet; I got another panic in the meantime. The missing symbols are in com.apple.security.sandbox, per the panic screen. no symbols in the kernel debug kit.
This time, CPU failing to respond to interrupts. I suppose I should read xnu source code.
#0 Debugger (message=0x5cf868 "panic") at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/AT386/model_dep.c:866
#1 0x0021b2bd in panic (str=0x58dd98 "\"pmap_flush_tlbs() timeout: \" \"cpu(s) failing to respond to interrupts, pmap=%p cpus_to_respond=0x%lx\"@/SourceCache/xnu/xnu-1486.2.11/osfmk/i386/pmap.c:5019") at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/debug.c:303
#2 0x0028d2b4 in pmap_flush_tlbs (pmap=0x84b6e0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/pmap.c:5017
#3 0x0028f0ce in pmap_page_protect (pn=261988, prot=0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/i386/pmap.c:2559
#4 0x0026b423 in vm_object_pmap_protect (object=0x4f27784, offset=0, size=20480, pmap=0x0, pmap_start=326148096, prot=0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/vm/vm_object.c:2682
#5 0x002606b2 in vm_map_delete (map=0x2387d74, start=<value temporarily unavailable, due to optimizations>, end=326168576, flags=5, zap_map=0x0) at /SourceCache/xnu/xnu-1486.2.11/osfmk/vm/vm_map.c:5180
#6 0x00260c82 in vm_map_remove (map=0x2387d74, start=326148096, end=326168576, flags=1) at /SourceCache/xnu/xnu-1486.2.11/osfmk/vm/vm_map.c:5289
#7 0x00259a5d in kmem_free (map=0x2387d74, addr=326148096, size=19880) at /SourceCache/xnu/xnu-1486.2.11/osfmk/vm/vm_kern.c:709
#8 0x002200e7 in kfree (data=0x1370a000, size=19880) at /SourceCache/xnu/xnu-1486.2.11/osfmk/kern/kalloc.c:516
#9 0x1c55004c in ?? ()
#10 0x1c43d678 in ?? ()
#11 0x1c55032b in ?? ()
#12 0x1c5506af in ?? ()
#13 0x1c54ee93 in ?? ()
#14 0x0057deac in mac_vnode_check_open (ctx=0x46dfd14, vp=0x38ecb90, acc_mode=259) at /SourceCache/xnu/xnu-1486.2.11/security/mac_vfs.c:784
#15 0x002ef817 in vn_open_auth (ndp=0x224f3d38, fmodep=0x224f3cec, vap=0x224f3e8c) at /SourceCache/xnu/xnu-1486.2.11/bsd/vfs/vfs_vnops.c:384
#16 0x002e8ffa in open1 (ctx=0x46dfd14, ndp=0x224f3d38, uflags=258, vap=0x224f3e8c, retval=0x46dfc54) at /SourceCache/xnu/xnu-1486.2.11/bsd/vfs/vfs_syscalls.c:2307
#17 0x002e94a8 in open_nocancel (p=0x49e3000, uap=0x6101188, retval=0x46dfc54) at /SourceCache/xnu/xnu-1486.2.11/bsd/vfs/vfs_syscalls.c:2511
#18 0x004ee947 in unix_syscall64 (state=0x6101184) at /SourceCache/xnu/xnu-1486.2.11/bsd/dev/i386/systemcalls.c:433
I'm familiar with that panic. Same root cause, different panic message, same way to workaround it.
But I'm really confused now. We have only ever seen this panic in SMP VMs (VMs with multiple virtual CPUs). Are you 100% certain that your VM only has one virtual CPU? Can you send the .vmx file?
You're right, I'm an idiot. I'd been looking at the settings for my 10.5 vm (1 cpu) and not 10.6:
numvcpus = "2"
Sigh.
No problem. It happens.
If you have multiple virtual CPUs, the workaround of passing this as a kernel boot argument:
slto_us=0xffffffff
will solve your problem then. It only works on 10.6 guests though (10.5 kernels do not recognize this option).