We have some linux VMs that are crashing or powering off randomly. I see the following the in the vmware.log;
W110: MONITOR PANIC: vcpu-9:ASSERT vmcore/vmm/platform/common/platform.c:30 bugNr=17332
CoreDump: dumping core with superuser privileges
vcpu-0| I120: VMK Stack for vcpu 0 is at 0x4125a86d5000
vcpu-0| I120: Beginning monitor coredump
vcpu-0| I120: End monitor coredump
Hosts are a mix of 51 & 55, dell & hp and its happening across multiple ones.
vmkernel shows the following;
World_VMMPanic@vmkernel#nover+0x28 stack: 0x4124dde27000, 0x4124dde2
World_VMMPanic@vmkernel#nover+0x28 stack: 0x418000000000, 0x410043cb
We are facing exactly the same issue.
VM OS: SLES 11 SP3 for SAP Applications
The VM is hosting a SAP HANA Database.
ESX Host is 5.5 U2 Build 2068190
We had this crash 3 times, so far, and it always happened while copying backup files to a mounted NFS Share.
vmware.log of the VM shows:
2016-03-07T11:35:29.685Z| vcpu-0| I120+ vcpu-22:ASSERT vmcore/vmm/platform/common/platform.c:30 bugNr=17332
vmkernel.log of the Host shows:
2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: VmMemCow: vm 10511787: 3746: unable to alloc page: pgNum 0x2c7f362
2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: VmMemPf: vm 10511787: 676: COW copy failed: pgNum=0x2c7f362, mpn=0xffffffff
2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: VmMemPf: vm 10511787: 774: pgNum=0x2c7f362 failed
2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: World: vm 10511810: 11151: vmm22:sapbwp001v:vcpu-22:ASSERT vmcore/vmm/platform/common/platform.c:30 bugNr=17332
Ticket @VMware is open till last thursday, but we got no useful information, yet.
Suse Support is informed, too, but they said it´s a VMware related problem and they want to close this case.
A Support Case is also opened with our external company which is supporting our SAP Environment, but no useful information for their site, too
I am happy for any ideas, because I got none at the moment !
Most likely you are using memory reservations or NUMA Memory Affinity. For memory reservations make sure you allow for ~5% overhead, for NUMA memory affinity - lose it. This option was actually removed from Web Client, if using NUMA optimization (as you would for SAP), use numa.nodeAffinity instead.