Hp DL580 Gen9 ESXi hosts and Vcenter server were updated from 6.7 to 6.7u1. We used HPE Esxi bundles and upgraded the Gen9 firmware/BIOS/Drivers for transactions but one of the host is always crash when we migrate the virtual machines which is the high performance.
Please see pic.
We are asking for help,
Hello, we had bora/vmkernel/main/dlmalloc PSODs during replication/SRM of large VMs.
It has been fixed by bugfix from 2018-11-09. So just try to upgrade to latest ESXi. This shoul help you.
But I don't know, if there is HPE customized image of that build.
Is anyone still getting this error? We have updated to EP5 and also now EP6 but we still get the PSODs. We are not using Site replication manager. We have also had EP6 applied and running for the last 2 or 3 weeks and have experienced another PSOD this afternoon
We are using HPE BL460c G10 blades that were all originally built with the HPE custom ISO.
Hello, recently we are running on:
VMware ESXi, 6.7.0, 10764712
vSphere Client version 188.8.131.5200
vSphere Replication Appliance 184.108.40.20673 Build 10721838
Site Recovery Version 8.1.1, Build 10646916
After ESXi upgrade to build 10764712 servers are running for 95 days. Fortunately without PSODs since upgrade to 10764712 (I'm knocking on the wood).
We replicate about 25 VMs, Windows and Linux, with compression and quiescing, various disk sizes.
ESXi is installed via ISO from VMware, server HW is Huawei 1288H V5.
Do you need to use HPE custom ISO?
Is it possible for you to try reinstall ESXi from VMware original ISO?
The reply from VMWare was that there is a problem report with engineering saying that when the load-based net queue balancer module misses or fails to clean-up the RSS engine private data it causes the dedicated heap to get full and that causes the crash during subsequent load-balancing,
This will be fixed in a future release and a workaround in the meantime is to apply the below command too all my nics in every server affected:
esxcli network nic queue loadbalancer set --rsslb=false -n vmnicX
(where X is the vmnic number)
Kb article: https://kb.vmware.com/kb/58874
I have applied the setting to all 8 nics in 5 of my UAT hosts, all seems working ok over the weekend. Going to do production this week but only time will tell if it has actually fixed the issue.
This was my PSOD error for reference:
Panic Details: Crash at 2019-02-25T07:04:04.844Z on CPU 4 running world 2097340. VMK Uptime:20:17:54:31.077
Panic Message: @BlueScreen: PANIC bora/vmkernel/main/dlmalloc.c:4924 - Usage error in dlmalloc
0x451a45e1bb00:[0x41803b90ac15]PanicvPanicInt@vmkernel#nover+0x439 stack: 0x0, 0x41803bc9ffc0, 0x451a45e1bba8, 0x0, 0x1
0x451a45e1bba0:[0x41803b90ae48]Panic_NoSave@vmkernel#nover+0x4d stack: 0x451a45e1bc00, 0x451a45e1bbc0, 0x451a45e1bc18, 0x41803bc9ff79, 0x133c
0x451a45e1bc00:[0x41803b953442]DLM_free@vmkernel#nover+0x657 stack: 0x430e2609f590, 0x41803b950631, 0x430e2605b030, 0x845e1bc78, 0x451a00000000
0x451a45e1bc20:[0x41803b950630]Heap_Free@vmkernel#nover+0x115 stack:0x451a00000000, 0x80, 0x43053b7388b0, 0x43053b738860, 0x43053b7388b0
0x451a45e1bc70:[0x41803c4bbd30]RSSPlugCleanupRSSEngine@(lb_netqueue_bal)#<None>+0x7d stack: 0x43053b738860, 0x41803c4bbf2b, 0x430e2605c1d8, 0x43053b738860, 0x0
0x451a45e1bc90:[0x41803c4bbf2a]RSSPlugInitRSSEngine@(lb_netqueue_bal)#<None>+0x127 stack: 0x0, 0x20c49ba5e353f7cf, 0x43053b7388b0, 0x43053b738780, 0x43053b738970
0x451a45e1bcd0:[0x41803c4bc21c]RSSPlug_PreBalanceWork@(lb_netqueue_bal)#<None>+0x1cd stack: 0x32, 0x32, 0x0, 0xe0, 0x43053b738780
0x451a45e1bd30:[0x41803c4b8752]Lb_PreBalanceWork@(lb_netqueue_bal)#<None>+0x21f stack: 0x43053b738780, 0xff, 0x0, 0x4304d0431840, 0x43053b738780
0x451a45e1bd80:[0x41803ba166c8]UplinkNetqueueBal_BalanceCB@vmkernel#nover+0x6f1 stack: 0x43053b6db088, 0x43053b7387c0, 0x43053b738780, 0x43053ae4c5f0, 0x43053b6db088