VMware Communities
algawi86
Contributor
Contributor

VMWare workstation monitor.suspend_on_triplefault fails

Hi,

I'm having some trouble with vmware, i'm running a nested hypervisor (one which i devloped) within vmware workstation, and sometimes i receive the following message

VMware Workstation unrecoverable error: (vcpu-0)

vcpu-0:ASSERT vmcore/vmm/cpu/segment.c:679 bugNr=19580

A log file is available in "/home/asaf/vmware/Ubuntu/vmware.log". 

You can request support. 

I know this is a bug in my hypervisor, but i was wondering if someone from the vmware team or anyone else who knows, could tell me what kind of assertion had faild.

thanks.

6 Replies
dariusd
VMware Employee
VMware Employee

That assertion is configured to fail whenever the virtual machine's CPU encounters a triple-fault.  Our assumption is that if you are running with debugging enabled, a triple-fault is an event that you're likely to want to capture and debug, so it's wired up to trigger a virtual machine monitor panic by means of raising an assertion failure.  It won't do this when the VM is configured with Gather debugging information set to anything other than Full.

With monitor.suspend_on_triplefault set, this should produce a pair of files named debug.guest (which is a trimmed-down .vmss file, not suitable for restoring the state of the whole VM, but containing enough CPU state to enable debugging) and debug.vmem (containing the state of the VM's memory).  Are those files not being produced?  Those two files should be suitable for use with vmss2core, which should be able to produce regular ELF corefiles from the VM's state at the time of the triple-fault.

Cheers,

--

Darius

algawi86
Contributor
Contributor

I see, thanks for clarifying this, but now i'm having another problem.

I'm trying to generate the core file, since the OS i'm running is Ubuntu 16.04 with 4.4.0 kernel I've used the script found here (with some modifications, the rsp0 changed to sp0 for newer kernels)

http://stackframe.blogspot.co.il/2007/10/configuring-application-debugging-with.html

to generete to core file im using the following command

vmss2core -l 0x404,0x3a0,0x0,0x350,0x600,0x448,0x9c0,0x40,0x18,0x30,0x4000,0x488,0x4f8,0x10 -N debug.guest debug.vmem

but i get the following error

vmss2core version 5528349 Copyright (C) 1998-2017 VMware, Inc. All rights reserved.

VCPUs=2 memSize=0x80000

GuestDumpInitRegionInfo: regionInfo.count=1

GuestDump_CreateContext: vcpu0 mode=2

Started core writing.

Writing note section header.

Writing 1 memory section headers.

Writing notes.

Cannot translate linear address ffff88007b600000.

Cannot read kernel stack pointer.

Cannot read pointer to current task

Cannot write Elf header.

Finished writing core.

What am i doing wrong ?

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

Unless you're actually using Red Hat Linux and its crash utility, I'd drop the "-l" and "-N" options from the vmss2core command line – just give vmss2core the filenames for debug.{guest,vmem}, nothing else.  Does it still fail?

Can you post a vmware.log showing the failure here in the forums?  Just use the attach button in the lower-right when composing a reply... please don't copy-and-paste the entire log.  I'll take a peek at the log and see if I can provide some guidance.

Cheers,

--

Darius

algawi86
Contributor
Contributor

Without any flags it fails, but with -M flags i get the core file

the vmware log is attached.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

So here's the monitor ring buffer from the log you provided.  The monitor ring buffer tracks various CPU emulation events, many of which are faults or exceptions, and only some of which are actually visible to the guest.  It's in reverse chronological order.  I'll walk through it and explain as I go:

------MONITOR RING BUFFER START(entries=256, indexUnwrapped 42835 entrySz=64)-----

082 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 #TRIPLE             000e CPL0 PROT 64-bit fluff=0000

081 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 NESTED #PF addr=    ffffffffff574080 CPL0 PROT 64-bit fluff=0000

080 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 #DF                 000e CPL0 PROT 64-bit fluff=0000

079 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 NESTED #PF addr=    ffffffffff5740e0 CPL0 PROT 64-bit fluff=0000

078 --- CS:XIP 0010 ffffffff81842bdd SS:XSP 0000 ffff8800337c6f88 #PF addr=           ffffffff81842bdd CPL0 PROT 64-bit fluff=0004

So the point where the guest clearly went off the rails was during fetch of the instruction at 10:ffffffff81842bdd (entry index 078) -- the fault address is equal to XIP (RIP in 64-bit code), so the instruction address was not mapped, and the CPU attempted to deliver a page fault.

The IDT entry for the #PF handler was also not mapped (entry index 079, with IDTR=ffffffffff574000), so our emulation raised a double-fault (entry index 080).  The IDT entry for the #DF handler was (unsurprisingly at this stage) also not mapped (entry index 081), so our emulation raised a triple-fault (entry index 082).  At that point, a physical host would have rebooted.

So... it would seem that something executing shortly prior to 10:ffffffff81842bdd managed to corrupt the page tables or reconfigure the core's CR3 or operating mode such that the page tables referenced by CR3 were not appropriately formatted in order to fetch that instruction, nor to allow access to the IDT.  Sometimes vmss2core's diagnostic output might give additional clues as to what's happening, since when run without the "-M" option it will make use of the guest's paging structures in much the same way as the virtual CPU would, so that it can produce a corefile that covers the entire mapped address space for each virtual CPU core according to its operating mode and CR3 value.  If not, you'll need to dig into how you got to 10:ffffffff81842bdd without usable paging structures.

Hope that is of some help!

Cheers,

--

Darius

sethgfromsun
Contributor
Contributor

Excellent explanation.  Thank you!

Reply
0 Kudos