Re: VM running on ESXI 6.7 host crashed and power ...

tagil · ‎09-29-2021

Error messeges:

1. VM1 on esxi-1 in ha-datacenter is powered off

2. Error message on VM1 on esxi-1 in ha-datacenter: We will respond on the basis of your support entitlement.

3. An application (/bin/vmx) running on ESXi host has crashed (1 time(s) so far). A core file might have been created at /vmfs/volumes/60c97a26-4d981f78-4d71-ecebb8885488/1c-production/vmx-zdump.002.

swaheed1239 · ‎09-30-2021

Check for any abrupt disconnections from storage. VM crashes are mostly related to storage issues. Any APD/PDL has been noticed or not?

tagil · ‎09-30-2021

Storage on local SSD drive. Other VM good working without errors.

Logfile on crashed VM:

...

Stevens7 · ‎09-30-2021

A VM may be automatically shut down or suspended if: The environment was not in use. The environment-wide auto-shutdown options shut down or suspend all of the VMs in an environment after a period of inactivity. For more information, see How auto-shutdown works.

Advanced MD

swaheed1239 · ‎09-30-2021

Did you check the VMkernel.log file for any errors?? Check vmkernel.log file for any errors within that timestamp when the VM was crashed. Or you can attach the vmkernel.log file so we can see.

pkvmw · ‎09-30-2021

The vmx-coredumps generated when VMs crash can be analyzed from the VMware Support, hence I'd recommend filing a SR to get this investigated. Especially if this is a re-occurring behavior.

Otherwise it's pretty hard to guess to why the VMX world might have crashed.

Regards,
Patrik

tagil · ‎09-30-2021

2021-09-30T07:37:03.505Z cpu11:2129905)WARNING: World: vm 2129905: 8463: vmm2:VM1:vcpu-2:EPT misconfiguration: PA 16f2a99b78
2021-09-30T07:37:03.505Z cpu11:2129905)World: 8466: vmm group leader = 2129900, members = 8
2021-09-30T07:37:03.505Z cpu11:2129905)Backtrace for current CPU #11, worldID=2129905, fp=0x1e
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bf30:[0x41802b73e5e0]WorldPanicWork@vmkernel#nover+0xd4 stack: 0xfffffffffffffc40, 0x451a6b9a3000, 0x100000000, 0x451a6b9a3000, 0x1e
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bf90:[0x41802b73e95f]World_VMMPanic@vmkernel#nover+0x1c stack: 0x41802b75ce4d, 0xfffffffffc407cf0, 0x0, 0x400, 0x0
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bfa0:[0x41802b736c6e]VMMVMKCall_Call@vmkernel#nover+0xf7 stack: 0x0, 0x400, 0x0, 0x82, 0x1
2021-09-30T07:37:03.505Z cpu11:2129905)0x451a6b99bfe0:[0x41802b75cecd]VMKVMM_ArchEnterVMKernel@vmkernel#nover+0xe stack: 0x41802b75cec0, 0xfffffffffc008e12, 0x0, 0x0, 0x0
2021-09-30T07:37:06.088Z cpu44:2097517)WARNING: CpuSched: 996: Automatic relation removal from 2129916(vmx-vcpu-0:VM1, zombie) to 2129917(LSI-2129900:0)
2021-09-30T07:37:12.874Z cpu0:2097231)NMP: nmp_ThrottleLogForDevice:3872: Cmd 0x85 (0x459a8caa1240, 2100135) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0. Act:NONE
2021-09-30T07:37:12.874Z cpu0:2097231)ScsiDeviceIO: 3483: Cmd(0x459a8caa1240) 0x85, CmdSN 0x149 from world 2100135 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
2021-09-30T07:37:12.874Z cpu0:2097231)ScsiDeviceIO: 3483: Cmd(0x459a8caa1240) 0x85, CmdSN 0x14a from world 2100135 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x0 0x0.
2021-09-30T07:37:17.045Z cpu1:2129923)UserDump: 3110: vmx-vcpu-6:VM1: Dumping cartel 2129899 (from world 2129923) to file /vmfs/volumes/60c97a26-4d981f78-4d71-ecebb8885488/VM1/vmx-zdump.003 ...
2021-09-30T07:37:18.571Z cpu1:2129923)UserDump: 3258: vmx-vcpu-6:VM1: Userworld(vmx-vcpu-6:VM1) coredump complete.
2021-09-30T07:37:18.577Z cpu12:2097296)cswitch: VSwitchDisablePTIOChainRemoveCB:1150: [nsx@6876 comp="nsx-esx" subcomp="vswitch"]Remove ptDisable IOChain Handle for port 0x2000009
2021-09-30T07:37:18.577Z cpu12:2097296)NetPort: 1580: disabled port 0x2000009
2021-09-30T07:37:18.578Z cpu20:2097296)Net: 3713: disconnected client from port 0x2000009
2021-09-30T07:37:22.519Z cpu39:2133823)WARNING: PFrame: vm 2129900: 2489: Deallocating pinned bpn 0x101785400, pinCount 1 throttle 0.
2021-09-30T07:37:23.042Z cpu22:2097348)CBT: 1388: Destroying device 33995-cbt for cbt driver with filehandle 211349
2021-09-30T07:37:23.042Z cpu6:2097347)CBT: 1388: Destroying device 63994-cbt for cbt driver with filehandle 407956
2021-09-30T07:37:23.116Z cpu10:2097236)NMP: nmp_ThrottleLogForDevice:3818: last error status from device mpx.vmhba32:C0:T0:L0 repeated 1 times

swaheed1239 · ‎09-30-2021

There are lots of Vmware KBs on this issue one of which is "ESX unrecoverable error: (vcpu-0) vcpu-0: EPT misconfiguration: PA fb100000" error seen in crashing...

You can try and make sure all your bios/firmware is updated including your esxi host. Make sure all your infrastructure components are up to date and if still this issue re-occurs try to involve support.

All

VM running on ESXI 6.7 host crashed and power off