VMware Cloud Community
Dirtrunner
Contributor
Contributor

New ESXi 5.5 Install threw PSOD, Raid controller driver?

Can I get someone to glance at the PSOD I got on a new install of 5.5 ?

Installed this on Friday night and this Monday morning it was sitting at a purple screen. Ran fine all weekend as far as I can tell.

Its a DL380p G8 with both p420i and p420 Raid controllers

using HP-ESXi 5.5.0 iso 1331820

I think its yelling about the Raid controller but cant say for sure.

Looking at the vmkernal log im seeing this line over and over.

2014-03-09T13:08:07.636Z cpu2:286677)<4>hpsa 0000:02:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562

2014-03-09T14:30:45.964Z cpu11:303182)<4>hpsa 0000:0a:00.0: cp 0x410a2b700000 has status 0x2 Sense: 0x5, ASC: 0x20, ASCQ: 0x0, Returning result: 0x2

I attached vmkwaring.log and vmkernal.log and a screenshot of the error.

Thank you guys!

42 Replies
grasshopper
Virtuoso
Virtuoso

Hi ,

Indeed it appears that you have discovered a potential memory leak that is starving the vmkernel or otherwise causing mayhem.  Please ssh into the host and capture a support bundle asap while it's still fresh.  Do this by typing vm-support.  The logs will be saved in /var/tmp.

You should also grab the vmkernel zdump by performing an esxcfg-dumppart -L against the zdump filename.  This creates a file that is very useful for support in debugging the diagnostic screen.  Remember that the zdump you are targeting is probably in /var/core, but the file it outputs will be saved in your current directory.  So I like to cd to the /var/tmp directory before running this.  That way all my logs are in one place.

Optional:  This is contained in the vm-support bundle, but may be handy for quick reference.  Run an 'esxcli software vib list > /var/tmp/my-vibs.txt' so you have a list of all software installed on the host.

Gather all that using WinSCP (in SCP mode, not SFTP) and share with HP and VMware Support (also include the screenshot for good measure).  HP will be interested to know which SPP you are running (i.e. is firmware the latest, etc.).  Also check the iLO logs to see if there were any disk failures or raid rebuilds that were aggravating the issue.

Let us know if you need anything.

jrmunday
Commander
Commander

Agreed with grasshopper.

Check what version of the scsi-hpsa VIB you have installed, and update this if required (including hardware firmware).

Here is an example from my nested lab running on my laptop;

~ # esxcli software vib list | grep -i scsi-hpsa

scsi-hpsa                      5.5.0-44vmw.550.0.0.1331820           VMware  VMwareCertified   2013-12-09

~ # esxcli software vib get -n scsi-hpsa

VMware_bootbank_scsi-hpsa_5.5.0-44vmw.550.0.0.1331820

   Name: scsi-hpsa

   Version: 5.5.0-44vmw.550.0.0.1331820

   Type: bootbank

   Vendor: VMware

   Acceptance Level: VMwareCertified

   Summary: hpsa: scsi driver for VMware ESX

   Description: HP Smart Array SCSI Driver

   ReferenceURLs:

   Creation Date: 2013-09-19

   Depends: vmkapi_2_2_0_0, com.vmware.driverAPI-9.2.2.0

   Conflicts:

   Replaces:

   Provides:

   Maintenance Mode Required: True

   Hardware Platforms Required:

   Live Install Allowed: False

   Live Remove Allowed: False

   Stateless Ready: True

   Overlay: False

   Tags: driver, module

   Payloads: scsi-hps

~ #

Looks like there were previous issues with this VIB, so I wouldn't be too surprised if not all issues were resolved;

VMware KB: VMware ESXi 5.0, Patch ESXi500-201310204-UG: Updates VMware ESXi 5.0 scsi-hpsa vib

Cheers,

Jon

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
Dirtrunner
Contributor
Contributor

Thank you to both of you. Your guys' help is immensely appreciated!

I will get a support ticket going with support just as grasshopper recommends.

As for jrmunday 's post, I used the latest HP ISO which installed the 5.5.0.58-1OEM.550.0.0.1331820 VIB

I also checked the firmware on the raid controller and it is at 3.04 and the current version seems to be at 5.22 and the backplane expander also has firmware updates that can be applied.

Im going to schedule a maintenance window for this server, flash the latest and greatest and report back if the logs are still showing that out of memory message or anything else funky. Hopefully this will help someone else out one day.

0 Kudos
jrmunday
Commander
Commander

Hopefully the latest firmware and drivers will help resolve this. I had an issue recently where HP branded Qlogic 2560 HBA's had old firmware but hosts (HP DL380p Gen 😎 had new drivers. In this case the FC ports would constantly flap up/down and crash the hosts with a PSOD. As soon as I flashed the firmware, and updated the drivers to the latest supported version all the issues disappeared.

vExpert 2014 - 2022 | VCP6-DCV | http://www.jonmunday.net | @JonMunday77
0 Kudos
YVesli
Contributor
Contributor

I have the same problem as well. Is the firmware update helps resolving the issue?

0 Kudos
Dirtrunner
Contributor
Contributor

Yes updating the firmware did solve the issue along with having the latest drivers for the controllers.

javier_dp
Contributor
Contributor

I have just hit this yesterday as well. Uptime was 18d.

Server is BL460G8, esxi build 1623387

Hewlett-Packard_bootbank_scsi-hpsa_5.5.0.58-1OEM.550.0.0.1331820

   Name: scsi-hpsa

   Version: 5.5.0.58-1OEM.550.0.0.1331820

   Type: bootbank

   Vendor: Hewlett-Packard

   Acceptance Level: VMwareCertified

   Summary: hpsa: scsi driver for VMware ESX

   Description: HP Smart Array SCSI Driver

   ReferenceURLs:

   Creation Date: 2013-12-16

   Depends: vmkapi_2_2_0_0, com.vmware.driverAPI-9.2.2.0

   Conflicts:

   Replaces:

   Provides:

   Maintenance Mode Required: True

   Hardware Platforms Required:

   Live Install Allowed: False

   Live Remove Allowed: False

   Stateless Ready: True

   Overlay: False

   Tags: driver, module

   Payloads: scsi-hps

0 Kudos
jonsaville
Contributor
Contributor

We have seen this since twice upgrading to 5.5.0 in February and 5.5.0u1 in March.

Server is a HP DL160 G6, latest BIOS. Storage is P410 with four SATA drives in 2 mirrors. iSCSI for near-line storage. Minimally loaded.

We upgraded to 5.5.0u1 in response to the first incident, but have just seen our second. P410 firmware was out of date (v5), patched today to 6.40.

scsi-hpsa is 5.5.0.58-1OEM.550.0.0.1331820 and seems to be the culprit:

2014-04-12T09:26:42.353Z cpu0:6329569)<4>hpsa 0000:07:00.0: cmd_special_alloc returned NULL!

2014-04-12T09:26:42.358Z cpu0:6329569)<4>hpsa 0000:07:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562

[7m2014-04-12T09:27:12.354Z cpu0:6329642)WARNING: LinDMA: dma_alloc_coherent:726: Out of memory[0m

Our PSOD is slightly different (Failed to ack TLB invalidate). Attached.

This machine (and another identical) were running with ESXi 5.0u1 for 18 months with zero problems, so this is a little frustrating.

PSoD.JPG

0 Kudos
MillardJK
Enthusiast
Enthusiast

Ding ding ding!

Add another install with the issue. Similar to Javier, we're on BL460c Gen8 w/5.5.0. The most annoying part: these are diskless blades, booting from SD card and using FC SAN for all storage. There isn't a newer version from HP, so I'm just going to remove the VIB and hope that does the trick.

——
Jim Millard
Kansas City, MO USA
0 Kudos
rabittom
Contributor
Contributor

Hi all,

i've updated a BL460-G7 a couple weeks ago - smooth. After 2 weeks uptime i've been faced with a PSOD.

VM-support analyzed and told me to update the firmware from HP (which was outdated at that time).

i ran SPP 2014.02 with the latest hotfixes and patched to ESX-build 1746018.

After one week uptime the same happened again - PSOD :smileyangry: with the same indicators.

last weekend i've had the same experiance on another Bladecenter with the same blade-types.

two hosts crashed more or less at the same time with a psod.

i opened a case at VM-support and the told me the following:

[Snip]

This is a know issue to VMware.
It is a problem with the hpsa driver.

We are advising anyone that has this PSOD to open a HP Support Request and
reference HP case 4648045806.

HP are working on an updated driver to resolve this issue.

[Snip]

I've contacted HP Brazil (where the affected Bladecenter ist located) - they told me that HP is aware of this issue and that they are working on that since April/14.

i'Ve contacted HP US (where i have some connections) - no answer till yet.

i've contacted HP Austria (where our HQ is located) - no answer till yet.

I've stopped the upgrade for the remaining (+50) hosts to 5.5 till HP find's it worth to inform their customers..

BTW: i do have a C-7000 with BL460-Gen8 (all on build 1746018)- no problem till yet.

CU

0 Kudos
klimenta
Contributor
Contributor

I had the same issue yesterday. The screenshot from the original post pretty much is the same as what I got on a BL460c Gen8 blade with P220i Smart Array controller.

Call VMware, they have an updated driver only available thru them at this point.

scsi-hpsa-5.5.0.58-2OEM.550.0.0.1198611.x86_64.vib is the file that needs to be uploaded and installed.

The original driver is the same version but with "-10EM" suffix.

0 Kudos
jfbordenjr
Contributor
Contributor

Did scsi-hpsa-5.5.0.58-2OEM.550.0.0.1198611.x86_64.vib help you?  I have put that on and testing now. 

Thanks,

John

klimenta
Contributor
Contributor

So far so good with "20EM". Nine days without a problem.

0 Kudos
rabittom
Contributor
Contributor

I've talked to VMware-support and the confirmed that this is a known bug. HP released that driver-version - but it was a beta and HP did not allow to deploy it anylonger when i called.

The guy told me that HP released an internal information that "if everything goes fine" they will have an official driver ready by this weekend. So we have to wait...

this is the official statement from VM-support:

************************************************************************

The reoccurring PSOD that your ESXI hosts are receiving is a known issue.

VMware and HP are working closely on finding a resolution for this problem.

We have identified that the problem is with the hpsa driver.

We know that this PSOD is caused by an out of memory condition but we don't know what triggers the
issue.

HP are currently working around the clock to release an update driver that should resolve this issue.

All things going well HP are hoping to have the driver this weekend, please
note this is subject to change.

 

We currently only have internal documentation on this issue.

 

However, VMware and HP are currently working on a
public facing document that we can provide to customers who hit this issue.

************************************************

i'm working with HP for quiet a long time so i assume that the driver will not be ready this weekend - any chance for me to get the offline-bundle from you?

thanks

tom

0 Kudos
klimenta
Contributor
Contributor

0 Kudos
rabittom
Contributor
Contributor

Hey Klimenta,

thanks a ton!!!

0 Kudos
EnterpriseSuppo
Contributor
Contributor

Hi Klimenta,

How did you get the driver?. Did HP provide or VMware?

0 Kudos
klimenta
Contributor
Contributor

VMware.

0 Kudos
pz5445
Contributor
Contributor

Is this a beta or an official driver? If so when will the official release become available?

0 Kudos