VMware Cloud Community
brian_rapidresp
Contributor
Contributor

Linux VMs crashing/dumping

We have some linux VMs that are crashing or powering off randomly. I see the following the in the vmware.log;

W110: MONITOR PANIC: vcpu-9:ASSERT vmcore/vmm/platform/common/platform.c:30 bugNr=17332

CoreDump: dumping core with superuser privileges

vcpu-0| I120: VMK Stack for vcpu 0 is at 0x4125a86d5000

vcpu-0| I120: Beginning monitor coredump

vcpu-0| I120: End monitor coredump


Hosts are a mix of 51 & 55, dell & hp and its happening across multiple ones.


vmkernel shows the following;


World_VMMPanic@vmkernel#nover+0x28 stack: 0x4124dde27000, 0x4124dde2

World_VMMPanic@vmkernel#nover+0x28 stack: 0x418000000000, 0x410043cb

VMMVMKCall_Call@vmkernel#nover+0x48c

Any ideas?

0 Kudos
2 Replies
PKaufmann
Enthusiast
Enthusiast

We are facing exactly the same issue.

VM OS: SLES 11 SP3 for SAP Applications

The VM is hosting a SAP HANA Database.

ESX Host is 5.5 U2 Build 2068190

We had this crash 3 times, so far, and it always happened while copying backup files to a mounted NFS Share.

vmware.log of the VM shows:

2016-03-07T11:35:29.685Z| vcpu-0| I120+ vcpu-22:ASSERT vmcore/vmm/platform/common/platform.c:30 bugNr=17332

vmkernel.log of the Host shows:

2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: VmMemCow: vm 10511787: 3746: unable to alloc page: pgNum 0x2c7f362

2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: VmMemPf: vm 10511787: 676: COW copy failed: pgNum=0x2c7f362, mpn=0xffffffff

2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: VmMemPf: vm 10511787: 774: pgNum=0x2c7f362 failed

2016-03-07T11:33:10.361Z cpu96:10511810)WARNING: World: vm 10511810: 11151: vmm22:sapbwp001v:vcpu-22:ASSERT vmcore/vmm/platform/common/platform.c:30 bugNr=17332

Ticket @VMware is open till last thursday, but we got no useful information, yet.

Suse Support is informed, too, but they said it´s a VMware related problem and they want to close this case.

A Support Case is also opened with our external company which is supporting our SAP Environment, but no useful information for their site, too Smiley Sad

I am happy for any ideas, because I got none at the moment !

0 Kudos
admin
Immortal
Immortal

Most likely you are using memory reservations or NUMA Memory Affinity. For memory reservations make sure you allow for ~5% overhead, for NUMA memory affinity - lose it. This option was actually removed from Web Client, if using NUMA optimization (as you would for SAP), use numa.nodeAffinity instead.

http://h20564.www2.hpe.com/hpsc/doc/public/display?docId=mmr_kc-0128169

http://pubs.vmware.com/vsphere-60/index.jsp#com.vmware.vsphere.resmgmt.doc/GUID-A80A6337-7B99-48C8-B...

https://www.vmware.com/files/pdf/techpaper/VMware-PerfBest-Practices-vSphere6-0.pdf

0 Kudos