This is a known issue with ESXi 6.7.
Symptom: An ESXi host might fail with the error PANIC bora/vmkernel/main/dlmalloc.c:4924 - Usage error in dlmalloc
When you replicate virtual machines using the VMware Site Replication Manager hbr_filter with vSphere Replication, the ESXi host might fail with purple screen immediately or within 24 hours and reports the error: PANIC bora/vmkernel/main/dlmalloc.c:4924 - Usage error in dlmalloc.
This issue is resolved in vSphere 6.7 EP5 - ESXi670-201811001
We published the following KB https://kb.vmware.com/s/article/55650.
I would recommend opening a ticket with VMware support and share the host log bundle along with the PSOD screenshot.
If you have a dump file available in the host log bundle what you can do is try to look at the vmkernel.log file, vmkwarning.log, vobd.log & hostd.log file to see what happened just before the host has crashed, that is what VMware is going to check prior to analyzing the dump file.
I hope it helps.
Hello; I tried to send you a private message but it seemed to fail. VMware support is tracking this issue for us also; however since the last time it occured the Sr. Solution Engineer who had been working the issue is now on PTO until July 3. Have you received any additional information as to a fix or workaround to avoid? We were told the issue was likely something related to VDR (vsphere replication).
I received this same issue twice today on 2 different hosts. The last response on this thread was back over a month ago. Is there any solution to this problem?
These heap related PSOD's are tricky. This one does not seem to be reported yet. You might want to raise a case with VMware if you want to get to the bottom of this.
If I am not wrong, this seems a BUG. contact VMware support to file a case with their engineering
I have Dell M630 blade servers running current firmware and received this PSOD on the first upgraded machine two hours after installing ESXi 6.7.0 Build 8941472 (Dell specific ISO - VMware-VMvisor-Installer-6.7.0-8941472.x86_64-DellEMC_Customized-A02.iso). I stopped the upgrade of the rest of the ESXi cluster nodes. They run fine under 6.5.0 Build 7388607. The host in question was replicating a VM to our disaster recovery site during the crash. The VM in question was stuck and did not free up until the host was rebooted.
Based on the functions reported on the PSOD screen, looks like the issue is in the vSphere Replication stack. You may want to pause the replication in case it is happening too frequently. In the meantime, engage VMware support to get further assistance on this.
Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.
"Engineering is currently investigating PSODs with the exact same trace that is occuring due to a HBR module (OEM Provided - Qlogic, in this case)."
I've updated my NIC drivers as suggested by support and have continued testing a couple of my hosts. These hosts are now running ESXi 6.7.0 build 9214924.
In our case the upgrade left an older QLogic driver on our hosts - qfle3 version 188.8.131.52. I installed VMware ESXi 6.7 Driver CD for QLogic Network/iSCSI/FCoE Driver Set from the Drivers and Tools tab which contains qfle3 184.108.40.206. This is the second day of testing but so far no PSODs. Check your NICs against the VMware compatibility list:
SSH into your ESXi host
esxcli network nic list #List your NICs
esxcli network nic get -n vmnic0 #Get your software and firmware versions for each NIC