mpasquier
Contributor
Contributor

Crash ESXi 5.5 after upgrade from 5.1

Hi all,

I've an issue concerning a freshly upgraded ESXi 5.5 from 5.1 version. It was running with no problem since the upgrade. But yet, it gets a purple screen sometimes (twice pas month).

It's running on a HP ML350 G6 server. This was upgraded using HP sources.

Here is what I get in dump file:

[7m2014-03-16T07:35:52.451Z cpu2:2544604)WARNING: LinDMA: dma_alloc_coherent:726: Out of memory [0m

2014-03-16T07:35:52.451Z cpu2:2544604)<4>hpsa 0000:0e:00.0: cmd_special_alloc returned NULL!

[7m2014-03-16T07:35:52.451Z cpu2:2544604)WARNING: LinDMA: dma_alloc_coherent:726: Out of memory [0m

2014-03-16T07:35:52.451Z cpu2:2544604)<3>hpsa 0000:0e:00.0: cmd_special_alloc returned NULL!

2014-03-16T07:35:52.451Z cpu2:2544604)<3>hpsa1: set_sas_ids: report extended physical LUNs failed.

2014-03-16T07:35:52.455Z cpu5:2544603)<4>hpsa 0000:04:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562

2014-03-16T07:36:11.210Z cpu3:37604)<4>hpsa 0000:0e:00.0: cp 0x410970db3280 has status 0x2 Sense: 0x2, ASC: 0x3a, ASCQ: 0x0, Returning result: 0x2

2014-03-16T07:36:11.212Z cpu4:32793)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x1a (0x412e807be880, 0) to dev "mpx.vmhba32:C0:T0:L0" on path "vmhba32:C0:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE

2014-03-16T07:36:11.212Z cpu4:32793)ScsiDeviceIO: 2337: Cmd(0x412e807be880) 0x1a, CmdSN 0x5d60 from world 0 to dev "mpx.vmhba32:C0:T0:L0" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.

2014-03-16T07:36:11.219Z cpu5:33299)<4>hpsa 0000:04:00.0: cp 0x410970d91500 has status 0x2 Sense: 0x5, ASC: 0x20, ASCQ: 0x0, Returning result: 0x2

2014-03-16T07:36:11.219Z cpu5:33299)<4>hpsa 0000:04:00.0: cp 0x410970d91000 has status 0x2 Sense: 0x5, ASC: 0x24, ASCQ: 0x0, Returning result: 0x2

[7m2014-03-16T07:36:22.453Z cpu3:2544693)WARNING: LinDMA: dma_alloc_coherent:726: Out of memory [0m

2014-03-16T07:36:22.453Z cpu3:2544693)<4>hpsa 0000:0e:00.0: cmd_special_alloc returned NULL!

[7m2014-03-16T07:36:22.453Z cpu3:2544693)WARNING: LinDMA: dma_alloc_coherent:726: Out of memory [0m

2014-03-16T07:36:22.453Z cpu3:2544693)<3>hpsa 0000:0e:00.0: cmd_special_alloc returned NULL!

2014-03-16T07:36:22.453Z cpu3:2544693)<3>hpsa1: set_sas_ids: report extended physical LUNs failed.

2014-03-16T07:36:22.458Z cpu3:2544692)<4>hpsa 0000:04:00.0: out of memory at vmkdrivers/src_9/drivers/hpsa/hpsa.c:3562

The internal SCSI controller firmware is up to date. There is no cluster and no vCenter installed/configured, it is running an essential license in standalone operation config.

Has anyone any idea please ?

Sincerly

Martin

11 Replies
ShiFtySnip
Contributor
Contributor

Same here - exactly same software HP ML350 G6 - have a purple screen already twice for last 11 days

PSCPU 1 Locked up. Failed to ack TLB invalidate.

All this after upgrade from  4.3 to 5.5 and started to work with iSCSI connected storage.

Please advice ! ?

0 Kudos
pratjain
VMware Employee
VMware Employee

Could you please extract the core dump for PSOD available in var/core using the command esxcfg-dumppart -L <vmkernel-zdump filename > and attacth the vmkernel-log file

Details steps are available in VMware KB http://kb.vmware.com/kb/1006796

Regards, PJ If you find this or any other answer useful please mark the answer as correct or helpful.
0 Kudos
lvaibhavt
Hot Shot
Hot Shot

Is the BIOS of the server up to date ?

0 Kudos
ShiFtySnip
Contributor
Contributor

HI thanks for reply, I`ve failed to extract the log however I've captured the purple screen with my camera - i`ll upload it here.

This is the last one:

I hphoto.JPG

And this is 12 day ago:

photo2.JPG

0 Kudos
mpasquier
Contributor
Contributor

Hi,

After a discussion and offline diag with HP, my HP contact gave me those links.

Update HP firmware :

       HP Service Pack for ProLiant:
http://h18004.www1.hp.com/products/servers/management/spp/index.html

Update Bundle :

* RECOMMENDED * HP ESXi Offline Bundle for VMware vSphere 5.5

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?sp4ts.oid=3884316&sp...

* RECOMMENDED * HP ESXi Utilities Offline Bundle for VMware vSphere 5.5

http://h20565.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/swdDetails/?sp4ts.oid=3884316&sp...


It seems that the BIOS ver D22 has multiple version named axactly the same D22 .... I will try installing those bundles and updates and give a feddback in two or three weeks if it was successful or not.


Thx for your replies


Martin Pasquier

Sanktuary
Contributor
Contributor

hi

same here ML350 G6. I have 3 of them and 2 of them has exactly that error.

Were the HP MGM Software update solve your problem?

If yes did your hp support contact saye something about other HW witch is involved in that error?

So i.E. DL380G8 ?

Thanks for your feedback.


Cheers

Florian

BTW: In my case it was a new installation with the HP Costomized ISO and a completle u2d System. (Install date around Mar)

0 Kudos
mpasquier
Contributor
Contributor

Hi Sanktuary,

Yes, those bundles and firmware upgrades solved my problem ! It's running like a champs since my last message without any problem \o/ !

HP didn't inform me about issue on other hardware.

What I found strange is that I already was running the D22 BIOS, but the D22 given from HP contact had different size, so I installed it and the new one seems to be a new version of D22 ...

I also upgraded SAS backbone firmware and installed new ESXi bundles to drive the SAS hardware. It's just running fine now. So maybe, try to find bundles AND/OR firmware upgrades for your DL380G8

Hope I helped you

Regards

Martin

Sanktuary
Contributor
Contributor

Hi mpasquer

Thanks for your feedback.

Hm i saw that the link witch the spp is not up2date but i downloadet it right now.

I will update (maybe downgrade) my esx tomorrow erarly and after that i will update the offline bundel stuff.

Im hopping to solve the problem with that Smiley Wink

Tanks again

Florian

0 Kudos
humblemumble
Contributor
Contributor

hello,

there is issue with the hpsa driver causing PSOD and out of memory issues, you might want to have look at:

http://h20566.www2.hp.com/portal/site/hpsc/template.PAGE/public/psi/topIssuesDisplay?javax.portlet.b...

br/S

0 Kudos
Sanktuary
Contributor
Contributor

Hm Finaly i got my problem also after the bios upgrade.

In my case it was a power setting in the bios. Cause vmware has with 5.5 per default an option on witch is corresponding to Colaborative Power Control.

Under ML350G6 there is this iption on and hase Problems with the default setting in vmware.

So HP told me to perform the steps below (boot intobios)

> Press F9 to enter setup.

> Select Power Management Options.

> Select Advanced Power Management Options.

> Select Collaborative Power Control.

> Select disabled.

0 Kudos
mpasquier
Contributor
Contributor

Bonjour,

Je serai absent jusqu'au 14.07.2014. Pour toute demande urgente vous pouvez vous adresser au service desk au 0800 111 100.

Meilleures salutations

Martin Pasquier

0 Kudos