VMware Cloud Community
Amfab
Contributor
Contributor

VMM fault 14 on one of our otherwise stable VM's

So we've been running ESXi 6.0 build 3825889 for about 6 months now without any issues. Out of nowhere, one of our production Windows Server 2008 R2 VM's powered off this morning. On powering it back up, it was clear the power off was ungraceful. There were no signs of a blue screen or minidump file inside the guest. I then looked to vSphere for information and saw relevant events for said VM (redacted info in brackets):

Description:Type:Date Time:
Error message from [Host IP]: We will respond on the basis of your support entitlement.error2/10/2017 7:35:34 AM
Virtual machine on [Host IP] is powered offinfo

2/10/2017 7:35:41 AM

Alarm 'Virtual machine CPU usage' changed from Green to Grayinfo2/10/2017 7:35:42 AM
Alarm 'Virtual machine memory usage' changed from Green to Grayinfo2/10/2017 7:35:42 AM
Task: Power On virtual machine (when I powered it back on)info2/10/2017 8:03:17 AM

On checking the vmware.log file for the VM, I discovered the following:

2017-02-10T14:35:27.351Z| vcpu-0| W110: MONITOR PANIC: vcpu-2:VMM fault 14: src=MONITOR rip=0x200000001 regs=0xfffffffffc607d20

2017-02-10T14:35:27.351Z| vcpu-0| I120: Core dump with build build-3825889

2017-02-10T14:35:27.351Z| vcpu-3| I120: Exiting vcpu-3

2017-02-10T14:35:27.351Z| vcpu-1| I120: Exiting vcpu-1

2017-02-10T14:35:27.351Z| vcpu-0| W110: Writing monitor corefile "/vmfs/volumes/574e0774-f40bbd18-1006-246e96162c88/APPS/vmmcores.gz"

2017-02-10T14:35:27.351Z| vcpu-2| I120: Exiting vcpu-2

2017-02-10T14:35:27.361Z| vcpu-0| W110: Dumping core for vcpu-0

2017-02-10T14:35:27.361Z| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2017-02-10T14:35:27.361Z| vcpu-0| I120: VMK Stack for vcpu 0 is at 0x439162a13000

2017-02-10T14:35:27.361Z| vcpu-0| I120: Beginning monitor coredump

2017-02-10T14:35:27.784Z| vcpu-0| I120: End monitor coredump

2017-02-10T14:35:27.785Z| vcpu-0| W110: Dumping core for vcpu-1

2017-02-10T14:35:27.785Z| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2017-02-10T14:35:27.785Z| vcpu-0| I120: VMK Stack for vcpu 1 is at 0x439162b13000

2017-02-10T14:35:27.785Z| vcpu-0| I120: Beginning monitor coredump

2017-02-10T14:35:28.203Z| vcpu-0| I120: End monitor coredump

2017-02-10T14:35:28.204Z| vcpu-0| W110: Dumping core for vcpu-2

2017-02-10T14:35:28.204Z| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2017-02-10T14:35:28.204Z| vcpu-0| I120: VMK Stack for vcpu 2 is at 0x439162b93000

2017-02-10T14:35:28.204Z| vcpu-0| I120: Beginning monitor coredump

2017-02-10T14:35:28.629Z| vcpu-0| I120: End monitor coredump

2017-02-10T14:35:28.629Z| vcpu-0| W110: Dumping core for vcpu-3

2017-02-10T14:35:28.629Z| vcpu-0| I120: CoreDump: dumping core with superuser privileges

2017-02-10T14:35:28.629Z| vcpu-0| I120: VMK Stack for vcpu 3 is at 0x439162c13000

2017-02-10T14:35:28.629Z| vcpu-0| I120: Beginning monitor coredump

2017-02-10T14:35:29.047Z| vcpu-0| I120: End monitor coredump

2017-02-10T14:35:34.860Z| vcpu-0| W110: A core file is available in "/vmfs/volumes/574e0774-f40bbd18-1006-246e96162c88/APPS/vmx-zdump.000"

2017-02-10T14:35:34.860Z| vcpu-0| I120: Msg_Post: Error

2017-02-10T14:35:34.860Z| vcpu-0| I120: [msg.log.error.unrecoverable] VMware ESX unrecoverable error: (vcpu-0)

2017-02-10T14:35:34.860Z| vcpu-0| I120+ vcpu-2:VMM fault 14: src=MONITOR rip=0x200000001 regs=0xfffffffffc607d20

2017-02-10T14:35:34.860Z| vcpu-0| I120: [msg.panic.haveLog] A log file is available in "/vmfs/volumes/574e0774-f40bbd18-1006-246e96162c88/APPS/vmware.log". 

2017-02-10T14:35:34.860Z| vcpu-0| I120: [msg.panic.requestSupport.withoutLog] You can request support. 

2017-02-10T14:35:34.860Z| vcpu-0| I120: [msg.panic.requestSupport.vmSupport.vmx86]

2017-02-10T14:35:34.860Z| vcpu-0| I120+ To collect data to submit to VMware technical support, run "vm-support".

2017-02-10T14:35:34.860Z| vcpu-0| I120: [msg.panic.response] We will respond on the basis of your support entitlement.

2017-02-10T14:35:34.860Z| vcpu-0| I120: ----------------------------------------

2017-02-10T14:35:34.861Z| vcpu-0| I120: Vigor_MessageRevoke: message 'msg.panic.response' (seq 2260) is revoked

Which led me to: Understanding VMM fault and VMM64 fault virtual machine monitor and executable failures (1021174) | ...

Type:Example error:Description:
Exception 14 (Page Fault)MONITOR PANIC: vcpu-0:VMM64 fault 14: src=MONITOR rip=0xfffffffffc2e99d3 regs=0xfffffffffc008e98\Occurs when a program attempts to access a page mapped in the virtual address space, but it has not been successfully loaded into memory.

I did a vm-support dump and sent the resulting .tgz archive to Dell ProSupport (figured I'd try them first since we only have vSphere Essentials and would have to pay for an incident with VMware). Dell essentially came to the same point as I did (the VMware KB mentioned above) and tentatively recommended upgrading to a newer ESXi build, but they lacked the tools to analyze the coredump and dig deeper. The VM has been running fine since being powered back on after the incident this morning, but I'd still like to know what caused this. I thought I'd try my luck here before paying for an incident. I'm happy to provide the coredump files mentioned above if needed. Any help is greatly appreciated.

P.S. No settings had been changed on this VM recently (guest or vSphere) and the Event Viewer in the guest VM didn't shed any light on it.. just the expected errors after booting up from an unexpected shutdown.

Edit: I find it odd that the VMM fault was seemingly 32-bit even though the VM is 64-bit.

Edit2: The hardware version of said VM is v10

Edit3: The host is a Dell PowerEdge R730

0 Kudos
0 Replies