VMware Cloud Community
tagil
Contributor
Contributor

VM running on ESXI 6.7 host crashed and power off

Error messeges:

1. VM1 on esxi-1 in ha-datacenter is powered off

2. Error message on VM1 on esxi-1 in ha-datacenter: We will respond on the basis of your support entitlement.

3. An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file might have been created at /vmfs/volumes/60c97a26-4d981f78-4d71-ecebb8885488/1c-production/vmx-zdump.002.

0 Kudos
7 Replies
swaheed1239
Enthusiast
Enthusiast

Check for any abrupt disconnections from storage. VM crashes are mostly related to storage issues. Any APD/PDL has been noticed or not?

0 Kudos
tagil
Contributor
Contributor

Storage on local SSD drive. Other VM good working without errors.

Logfile on crashed VM:

2021-09-30T07:37:03.505Z| vcpu-7| I125: Exiting vcpu-7
2021-09-30T07:37:03.505Z| vcpu-6| W115: MONITOR PANIC: vcpu-2:EPT misconfiguration: PA 16f2a99b78
2021-09-30T07:37:03.505Z| vcpu-5| I125: Exiting vcpu-5
2021-09-30T07:37:03.505Z| vcpu-4| I125: Exiting vcpu-4
2021-09-30T07:37:03.505Z| vcpu-0| I125: Exiting vcpu-0
2021-09-30T07:37:03.505Z| vcpu-3| I125: Exiting vcpu-3
2021-09-30T07:37:03.505Z| vcpu-2| I125: Exiting vcpu-2
2021-09-30T07:37:03.505Z| vcpu-6| I125: Core dump with build build-17700523
2021-09-30T07:37:03.505Z| vcpu-1| I125: Exiting vcpu-1
2021-09-30T07:37:03.510Z| vcpu-6| I125: Writing monitor file `vmmcores.gz`
2021-09-30T07:37:03.635Z| vcpu-6| W115: Dumping core for vcpu-0
2021-09-30T07:37:03.636Z| vcpu-6| I125: VMK Stack for vcpu 0 is at 0x451a69c13000
2021-09-30T07:37:03.636Z| vcpu-6| I125: Beginning monitor coredump
2021-09-30T07:37:03.729Z| mks| W115: Panic in progress... ungrabbing
2021-09-30T07:37:03.729Z| mks| I125: MKS: Release starting (Panic)
2021-09-30T07:37:03.729Z| mks| I125: MKS: Release finished (Panic)

...

0 Kudos
Stevens7
Contributor
Contributor

A VM may be automatically shut down or suspended if: The environment was not in use. The environment-wide auto-shutdown options shut down or suspend all of the VMs in an environment after a period of inactivity. For more information, see How auto-shutdown works. 

Advanced MD

0 Kudos
swaheed1239
Enthusiast
Enthusiast

Did you check the VMkernel.log file for any errors?? Check vmkernel.log file for any errors within that timestamp when the VM was crashed. Or you can attach the vmkernel.log file so we can see.

0 Kudos
pkvmw
VMware Employee
VMware Employee

The vmx-coredumps generated when VMs crash can be analyzed from the VMware Support, hence I'd recommend filing a SR to get this investigated. Especially if this is a re-occurring behavior.

Otherwise it's pretty hard to guess to why the VMX world might have crashed.

Regards,
Patrik

0 Kudos
tagil
Contributor
Contributor

2021-09-30T07:37:03.505Z cpu11:2129905)WARNING: World: vm 2129905: 8463: vmm2:VM1:vcpu-2:EPT misconfiguration: PA 16f2a99b78
2021-09-30T07:37:03.505Z cpu11:2129905)World: 8466: vmm group leader = 2129900, members = 8
2021-09-30T07:37:03.505Z cpu11:2129905)Backtrace for current CPU #11, worldID=2129905, fp=0x1e
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bf30:[0x41802b73e5e0]WorldPanicWork@vmkernel#nover+0xd4 stack: 0xfffffffffffffc40, 0x451a6b9a3000, 0x100000000, 0x451a6b9a3000, 0x1e
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bf90:[0x41802b73e95f]World_VMMPanic@vmkernel#nover+0x1c stack: 0x41802b75ce4d, 0xfffffffffc407cf0, 0x0, 0x400, 0x0
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bfa0:[0x41802b736c6e]VMMVMKCall_Call@vmkernel#nover+0xf7 stack: 0x0, 0x400, 0x0, 0x82, 0x1
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bfe0:[0x41802b75cecd]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x41802b75cec0, 0xfffffffffc008e12, 0x0, 0x0, 0x0
2021-09-30T07:37:06.088Z cpu44:2097517)WARNING: CpuSched: 996: Automatic relation removal from 2129916(vmx-vcpu-0:VM1, zombie) to 2129917(LSI-2129900:0)
2021-09-30T07:37:12.874Z cpu0:2097231)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a8caa1240, 2100135) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
2021-09-30T07:37:12.874Z cpu0:2097231)ScsiDeviceIO: 3483: Cmd(0x459a8caa1240) 0x85, CmdSN 0x149 from world 2100135 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
2021-09-30T07:37:12.874Z cpu0:2097231)ScsiDeviceIO: 3483: Cmd(0x459a8caa1240) 0x85, CmdSN 0x14a from world 2100135 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
2021-09-30T07:37:17.045Z cpu1:2129923)UserDump: 3110: vmx-vcpu-6:VM1: Dumping cartel 2129899 (from world 2129923) to file /vmfs/volumes/60c97a26-4d981f78-4d71-ecebb8885488/VM1/vmx-zdump.003 ...
2021-09-30T07:37:18.571Z cpu1:2129923)UserDump: 3258: vmx-vcpu-6:VM1: Userworld(vmx-vcpu-6:VM1) coredump complete.
2021-09-30T07:37:18.577Z cpu12:2097296)cswitch: VSwitchDisablePTIOChainRemoveCB:1150: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Remove ptDisable IOChain Handle for port 0x2000009
2021-09-30T07:37:18.577Z cpu12:2097296)NetPort: 1580: disabled port 0x2000009
2021-09-30T07:37:18.578Z cpu20:2097296)Net: 3713: disconnected client from port 0x2000009
2021-09-30T07:37:22.519Z cpu39:2133823)WARNING: PFrame: vm 2129900: 2489: Deallocating pinned bpn 0x101785400, pinCount 1 throttle 0.
2021-09-30T07:37:23.042Z cpu22:2097348)CBT: 1388: Destroying device 33995-cbt for cbt driver with filehandle 211349
2021-09-30T07:37:23.042Z cpu6:2097347)CBT: 1388: Destroying device 63994-cbt for cbt driver with filehandle 407956
2021-09-30T07:37:23.116Z cpu10:2097236)NMP: nmp_ThrottleLogForDevice:3818: last error status from device mpx.vmhba32:C0:T0:L0 repeated 1 times

0 Kudos
swaheed1239
Enthusiast
Enthusiast

There are lots of Vmware KBs on this issue one of which is "ESX unrecoverable error: (vcpu-0) vcpu-0: EPT misconfiguration: PA fb100000" error seen in crashing...

You can try and make sure all your bios/firmware is updated including your esxi host. Make sure all your infrastructure components are up to date and if still this issue re-occurs try to involve support.

Tags (1)
0 Kudos