VMware Cloud Community
Intercity
Contributor
Contributor

(Releasebuild-10302608) panic bora/vmkernel/main/dlmalloc.c c:4924 - Usage error in dlmalloc

Hi,

Hp DL580 Gen9 ESXi hosts and Vcenter server were updated from 6.7 to 6.7u1. We used HPE Esxi bundles and upgraded the Gen9 firmware/BIOS/Drivers for transactions but  one of the host is always crash when we migrate the virtual machines which is the high performance.

Please see pic.

We are asking for help,

Thanks,

Regards,

screenshot.JPG

0 Kudos
6 Replies
cernyj
Enthusiast
Enthusiast

Hello, we had bora/vmkernel/main/dlmalloc PSODs during replication/SRM of large VMs.

It has been fixed by bugfix from 2018-11-09. So just try to upgrade to latest ESXi. This shoul help you.

https://esxi-patches.v-front.de/ESXi-6.7.0.html

But I don't know, if there is HPE customized image of that build.

Jiri

0 Kudos
SupreetK
Commander
Commander

Relevant KB is VMware Knowledge Base​. Issue has been fixed in 6.7 EP-5 (build-10764712).

- Supreet

0 Kudos
Jbir
Enthusiast
Enthusiast

Is anyone still getting this error? We have updated to EP5 and also now EP6 but we still get the PSODs. We are not using Site replication manager. We have also had EP6 applied and running for the last 2 or 3 weeks and have experienced another PSOD this afternoon Smiley Sad

We are using HPE BL460c G10 blades that were all originally built with the HPE custom ISO.

0 Kudos
cernyj
Enthusiast
Enthusiast

Hello, recently we are running on:

VMware ESXi, 6.7.0, 10764712

vSphere Client version 6.7.0.20000

vSphere Replication Appliance 8.1.1.7173 Build 10721838

Site Recovery Version 8.1.1, Build 10646916

After ESXi upgrade to build 10764712 servers are running for 95 days. Fortunately without PSODs since upgrade to 10764712 (I'm knocking on the wood).

We replicate about 25 VMs, Windows and Linux, with compression and quiescing, various disk sizes.

ESXi is installed via ISO from VMware, server HW is Huawei 1288H V5.

Do you need to use HPE custom ISO?

Is it possible for you to try reinstall ESXi from VMware original ISO?

Jiri

0 Kudos
Jbir
Enthusiast
Enthusiast

They were all built with a HPE Custom ISO. I have opened a support case with VMWare to see what they suggest. Hoping not to have to re-install but will see what they recommend.

0 Kudos
Jbir
Enthusiast
Enthusiast

The reply from VMWare was that there is a problem report with engineering saying that when the load-based net queue balancer module misses or fails to clean-up the RSS engine private data it causes the dedicated heap to get full and that causes the crash during subsequent load-balancing,

This will be fixed in a future release and a workaround in the meantime is to apply the below command too all my nics in every server affected:

esxcli network nic queue loadbalancer set --rsslb=false -n vmnicX

(where X is the vmnic number)

Kb article: https://kb.vmware.com/kb/58874

I have applied the setting to all 8 nics in 5 of my UAT hosts, all seems working ok over the weekend. Going to do production this week but only time will tell if it has actually fixed the issue.

This was my PSOD error for reference:

Panic Details: Crash at 2019-02-25T07:04:04.844Z on CPU 4 running world 2097340. VMK Uptime:20:17:54:31.077
Panic Message: @BlueScreen: PANIC bora/vmkernel/main/dlmalloc.c:4924 - Usage error in  dlmalloc
Backtrace:
      0x451a45e1bb00:[0x41803b90ac15]PanicvPanicInt@vmkernel#nover+0x439 stack: 0x0, 0x41803bc9ffc0, 0x451a45e1bba8, 0x0, 0x1
      0x451a45e1bba0:[0x41803b90ae48]Panic_NoSave@vmkernel#nover+0x4d stack: 0x451a45e1bc00, 0x451a45e1bbc0, 0x451a45e1bc18, 0x41803bc9ff79, 0x133c
      0x451a45e1bc00:[0x41803b953442]DLM_free@vmkernel#nover+0x657 stack: 0x430e2609f590, 0x41803b950631, 0x430e2605b030, 0x845e1bc78, 0x451a00000000
      0x451a45e1bc20:[0x41803b950630]Heap_Free@vmkernel#nover+0x115 stack:0x451a00000000, 0x80, 0x43053b7388b0, 0x43053b738860, 0x43053b7388b0
      0x451a45e1bc70:[0x41803c4bbd30]RSSPlugCleanupRSSEngine@(lb_netqueue_bal)#<None>+0x7d stack: 0x43053b738860, 0x41803c4bbf2b, 0x430e2605c1d8, 0x43053b738860, 0x0
      0x451a45e1bc90:[0x41803c4bbf2a]RSSPlugInitRSSEngine@(lb_netqueue_bal)#<None>+0x127 stack: 0x0, 0x20c49ba5e353f7cf, 0x43053b7388b0, 0x43053b738780, 0x43053b738970
      0x451a45e1bcd0:[0x41803c4bc21c]RSSPlug_PreBalanceWork@(lb_netqueue_bal)#<None>+0x1cd stack: 0x32, 0x32, 0x0, 0xe0, 0x43053b738780
      0x451a45e1bd30:[0x41803c4b8752]Lb_PreBalanceWork@(lb_netqueue_bal)#<None>+0x21f stack: 0x43053b738780, 0xff, 0x0, 0x4304d0431840, 0x43053b738780
      0x451a45e1bd80:[0x41803ba166c8]UplinkNetqueueBal_BalanceCB@vmkernel#nover+0x6f1 stack: 0x43053b6db088, 0x43053b7387c0, 0x43053b738780, 0x43053ae4c5f0, 0x43053b6db088


0 Kudos