TomasAZ01
Contributor
Contributor

dlmalloc.c:4908 - Corruption in dlmalloc

ESXi 6.7.0u2 crashes with the following pink screen. Started after the upgrade from 6.5 to 6.7u2.

Is this a known issue or someone here with PowerEdge R630 experiencing this?

pastedImage_0.png

0 Kudos
9 Replies
msripada
Virtuoso
Virtuoso

Looking at the psod, its more talking about your qlogic ql_fcoe (qfle3f driver)

I found this third part blog which is on similar grounds, you can enable qlfe3 .. (btw, its in german so use translator)

ESXi 6.7 Purple Screen Qlogic 57xxx 

Thanks,

MS

0 Kudos
TomasAZ01
Contributor
Contributor

Hi!

Yes it is mentioned on several places that the qlfe3 driver has caused PSODs for a number of people in different situations.
But i believed this was in the past and should have been fixed. Most of the issues with this was in 2018.

If this is the reason the solution seems to be to revert to a previous driver, namely bnx2x.

Or is there another solution?

0 Kudos
msripada
Virtuoso
Virtuoso

well, in this case.. if you have latest driver/firmware available, you can update them else I would suggest open a SR with VMware GSS so that they can investigate this further

Thanks,

MS

0 Kudos
dozoekgr
Contributor
Contributor

I had a similar PSOD yesterday on one of my hosts.

It's strange as it occurs in FCoE driver that we don't use. It gets loaded anyways since new qfle3 driver stack (it didn't get loaded with bnx2 legacy drivers) when iSCSI offload is enabled but HBA stays in "offline" state. I had reconfigured iSCSI hardware offload to software initiator due to bad hardware performance about a month ago but didn't bother to disable offload functionality in OROM (manual and time-consuming task that requires power cycle).

I had same ESXi build and driver was a bit out of date (from February as included in baseline, latest qfle3f is the same in June and July updates). I updated all and disabled offload on all hosts with similar hardware, just in case.

I created SR #19016424207 just before finding this thread. I'd once again just expect it to be bad Q&A in qfle3 stack, I had massive stability problems when HPE dropped support for bnx2 that took about 9 months to get fixed.

0 Kudos
dozoekgr
Contributor
Contributor

By the way, my hardware platform is HPE, NIC model is 534FLR-SFP+ that is rebranded Broadcom/QLogic/Cavium/Marvell 57810S.

0 Kudos
agudger
Contributor
Contributor

Hi I have experienced this today, Logged a ticket with VMware support and also found this on our Skyline set up

Link to the KB :VMware Knowledge Base

Im now trying to hunt down that driver package, although not sure why it hasn't been released in to vum as a normal patch yet

Will update further when support come back

Thanks

Adam

0 Kudos
dozoekgr
Contributor
Contributor

Forgot to write back here but qfle3f crash is supposed resolved in latest async drivers. The latest ones are the same as mentioned by you: 2.0.88.0

These async drivers are almost never automatically imported into VUM, you should download bundle from link and import it into VUM manually for deployment.

https://my.vmware.com/group/vmware/details?downloadGroup=DT-ESXI67-QLOGIC-QFLE3-10860&productId=742

0 Kudos
agudger
Contributor
Contributor

Hi

I have managed to find a newer driver on the VMware download section for 6.7, the current version of drivers i am now running are

QLogic-Network-iSCSI-FCoE-v2.0.95-offline_bundle-14038984

So far so good, although the problem didn't show for 26 days the first time.

VMware support have updated me with the following KB, as the resolution is still in testing

VMware Knowledge Base

Thanks

Adam

0 Kudos
cfizz34vmware
Enthusiast
Enthusiast

0 Kudos