VMware Cloud Community
mwburden
Contributor
Contributor

ESXi 5.5.0 on HP DL360 G6 PSOD on warm boot

Good morning,

I just updated one of my ESXi servers from 5.0 to 5.5.   Everything seemed to go OK during the update, until the ESXi server rebooted, when it came up with a PSOD.

If I cold boot the server from a power-off state, it boots fine with no errors and comes up and reports "compliant" to vCenter.

If I reboot it via vCenter, it comes up with the PSOD again.

The details of the PSOD are below.    Has anyone else seen anything similar, or know what the issue is?

I got the PSOD by taking a cameraphone pic of the PSOD screen, running it through OCR, and fixing the errors, so if something doesn't make sense, there may still be some typos in it.   I've attached the original PSOD screenshot for reference.

VMware ESXi 5.5.0 [Releasebuild-1331820 x86_64]

PANIC bora/vmkerne1/main/dlmalloc.c:4892 - Usage error in dlmalloc

cr0=0x8001003d cr2=0x164e8080 cr3=0x800f0000 cr4=0x216c

*PCPU6:32820/helper11-0

PCPU  0: SHSSSSSHSHS

Code start: 0x41803ba00000 VMK uptime: 0:15:35:38.411

0x412380d1dbf0:[0x41803ba8ccd9]PanicvPanicInt@vmkernel#nover+0x575 stack: 0x412300000008

0x412380d1dc50:[0x41803ba8cfld]Panic_NoSave@vmkernel#nover+0x49 stack: 0x4109509b4000

0x412380d1dc70:[0x41803ba422eflDLM_free@vmkerne1#nover+0x67b stack: 0x20

Ox412380d1dcc0:[0x41803ba58017]Heap_Free@vmkernel#nover+Ox107 stack: 0x410954b060b0

0x412380d1dce0:[0x41803c1a7bldlooo_free_buf_single@com.broadcom.cnic#9.2.2.0+0x3d stack: 0x410954b0

0x412380d1dd20:[0x41803c1a93e5]cnic_free_resc@com.broadcom.cnic*9.2.2.0+0x16d stack: Ox412380d1dda0

Ox412380d1dd50:[0x41803cla674a]cnic_stop_hw@com.broadcom.cnic#9.2.2.0+Ox102 stack: Ox0

Ox412380d1ddb0:[0x41803c1ad8a2]cnic_ctl@com.broadcom.cnic#9.2.2.0+0x1a6 stack: Ox1

0x412380d1de50:[0x41803c193b0c]bnx2_cnic_stop@com.broadcom.bnx2#9.2.2.0+Ox7c stack: Ox1

0x412380d1de90:[0x41803c194784]bnx2_close@com.broadcom.bnx2#9.2.2.0+0x20 stack: 0x412300001018

Ox412380d1ded0:[0x41803c0b06b4]dev_close@com.vmware.driverAPI#9.2*Oxa5 stack: 13x41088ec56280

Ox412380d1def0:[0x41003c8b06b4]CloseNetDev@com.vmware.driverAPI#9.2+0x7c stack: 0x4108a0e28840

0x412380dldf30:[0x41803bc3a9231Up1inkAsyncProcessCallsHelperCB@vmkernel#nover+0x223 stack: Ox0

Ox412380dldfd0:[0x41803ba60f8a]helpFunc@vmkernel#nover+0x6b6 stack: Ox0

0x412380d1df10:[0x41803bc53242]CpuSched_StartWorld@vmkernel#nover+0xfa stack: Ox0

base fs=0x0 gs=0x418041800000 Kgs=0x0

Coredump to disk. Slot 1 of 1.

Finalized dump header (12/12) DiskDump: Successful.

Debugger waiting(world 32820) -- no port for remote debugger. "Escape" for local debugger.

Reply
0 Kudos
23 Replies
memaad
Virtuoso
Virtuoso

Hi,

It look like you are using CNA, I suspect this PSOD of ESXI host is due to driver issue of CNA, bnx2 driver. Check if  you using latest driver for CNA / HBA / NIC, if not update it and Open Support request with hardware vendor and VMware to investigate further.


Regards

Mohammed Emaad

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
mwburden
Contributor
Contributor

On further investigation, it turns out that the DL360 G6 isn't on the HCL for ESXi 5.5, even though it's on the list for 5.1.

I wasn't expecting it to have been dropped on a sub-release.

Reply
0 Kudos
a_p_
Leadership
Leadership

According the VMware Compatibility Guide as well as http://h18004.www1.hp.com/products/servers/vmware/supportmatrix/hpvmware.html the DL360 G6 models are supported for ESXi 5.5. Maybe it's jist a firmware issue!? Did you upgrade the host's firmware already?


André

Reply
0 Kudos
memaad
Virtuoso
Virtuoso

Hi,

a.p is correct DL360 G6 , is supported with ESXI 5.5, what is processor series.

Regards

Mohammed Emaad

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
mwburden
Contributor
Contributor

Yeah, the guy that checked the HCL for me selected "DL320" instead of "DL360".   Nevermind!

Reply
0 Kudos
mwburden
Contributor
Contributor

My hardware guy says that the firmware was updated to the latest release last week.

Reply
0 Kudos
mpogr
Enthusiast
Enthusiast

DL360 G6 is Xeon socket 1366 based. I have two similar systems (one with 2 Westmere CPUs and one with 1 such CPU) that had stability issues after upgrading to ESXi 5.5 (have been rock stable under ESXi 5.1). What solved the problem for me was turning CPU C-state support off in the BIOS. Do you mind trying this? I'm collecting evidence that there is a problem with C-state support on older Intel CPUs, would love to hear your feedback!

Reply
0 Kudos
ABusch
Enthusiast
Enthusiast

We have the identical problem with a DELL R710. At the moment DELL is investigating the problem. I'll give an update.

Reply
0 Kudos
mpogr
Enthusiast
Enthusiast

Dell R710 is again an Intel socket 1366-based system. Do you mind trying to switch the CPU C-state support off in the BIOS and report the results?

Reply
0 Kudos
ABusch
Enthusiast
Enthusiast

Short Update:

Dell changed the mainboard at this server today, it should be an issue with the NICs. No effect. I disabled the C-States in BIOS, no effect. I made a new installation of esxi 5.5 on this server, only the configuration I restored (following KB: 2042141). No effect.

So I have the same situation like before.

I think later this week, I can open a support case at VMware. But till now I spend some time I need to work on some other problems.

Best regards

Alex

Reply
0 Kudos
mwburden
Contributor
Contributor

Changing C-State support had no effect.

Reply
0 Kudos
memaad
Virtuoso
Virtuoso

Hi,

I suspect this to be driver issue of CNA you are using, either upgrade the driver if available or file support request with CNA vendor and VMware.

Regards

Mohammed Emaad

Mohammed | Mark it as helpful or correct if my suggestion is useful.
Reply
0 Kudos
zXi_Gamer
Virtuoso
Virtuoso

Hi,

    From the stack, it does look like a driver issue. But confirmation has to be made if it is the bnx2 or cna driver who is causing the PSOD. Earlier such DLM_free used to happen on HP servers and the workaround was to update the drivers.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=200022...

http://h30507.www3.hp.com/t5/Technical-Support-Services-Blog/More-on-debugging-VMware-Crashes/ba-p/1...

However, just to confirm, are you using the customized ISO provided by HP

Reply
0 Kudos
mwburden
Contributor
Contributor

Yes, we are using the custom HP ISO.

My hardware guy is looking for updated drivers now.

Reply
0 Kudos
mwburden
Contributor
Contributor

The installed drivers were already the latest.

Reply
0 Kudos
freaky2000
Contributor
Contributor

Any progress on this? Our DL360 G^ is PSOD'ing as well. Not immediately on boot however it will happily run for about a week and then PSOD.

HP ISO too btw.

There are, at least now, newer bnx drivers. VMware vSphere 5: Private Cloud Computing, Server and Data Center Virtualization

Reply
0 Kudos
ABusch
Enthusiast
Enthusiast

You can refer to my support request 13398046111. We use jumbo frames on the Broadcom NICs for ISCSI. So the solution was to to lower the mtu, turn off offloading features and then turn it back on again. Of curse we use jumbo frames (MTU 9000) again.

No PSOD since then.

Best

Alex

Reply
0 Kudos
mwburden
Contributor
Contributor

Can you give me a URL for that support request?   I can't find it.

Reply
0 Kudos
mwburden
Contributor
Contributor

Some more info from my network guy:

Network Adapters

Device           Speed         Configured    Switch      MAC Address        Obse..     Wake on LAN Supported 1

Broadcom Corporation NC382i Integrated Multi Port PCI Express Gigabit Server Adapter

vmnic1          1000 Full     Negotiate      vSwitch1    00:26:55:xx:xx:xx 172...       Yes

vmnic0          1000 Full     Negotiate      vSwitch0    00:26:55:xx:xx:xx 172...       Yes

Intel Corporation 82571EB Gigabit Ethernet Controller (Copper)

vmnic5           100 Full      Negotiate      vSwitch3    00:24:81:xx:xx:xx 172...       No

vmnic4           100 Full      Negotiate      vSwitch2    00:24:81:xx:xx:xx 172...       No

vmnic3          1000 Full     Negotiate      vSwitch1    00:24:81:xx:xx:xx 172...       No

vmnic2          1000 Full     Negotiate      vSwitch0    00:24:81:xx:xx:xx 172...       No

HP SPP 2013.9.0 shows all firmware up to date.

HP VMware ESXi Release 5.5.0, Build 1331820

After reinstalling everything from scratch, we are still having issues with this machine.

     vSwitch0 reboots fine with jumbo frames (MTU=9000).

     vSwitch1 reboots fine with standard frames but changing to jumbo frames causes the PSOD when a reboot is triggered from vCenter.

When testing jumbo frames on vmnic1 and vmnic3 separately, the system rebooted without a hitch.


Reply
0 Kudos