VMware Cloud Community
kur_Andy
Contributor
Contributor
Jump to solution

ESXi 6.7U2 Host - freeze and not responsive

Hi Community,

I have an ESXi Host that suddenly froze 2 days ago without apparent reason.

See attachment for error message.

There was no hardware failure.

I had to reboot the server and the host is running normally now.

I need to find the reason of the failure.

Is there something in the error message I can go by?

What log files should I check to find a more descriptive error message?

Thank you.

1 Solution

Accepted Solutions
Lalegre
Virtuoso
Virtuoso
Jump to solution

Hey,

That SPP has been removed for having vulnerabilities and HPE is recommending to upgrade to SPP 2020.03.2: https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_8ccab9e4b99047138c2e608f97#tab-history

I wanted to check which ilo vib comes in the HPE Custom ISO from VMware but the page is in maintenance. Also check the next drivers that are recommended for 6.7: https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX-6a25570f556a41b38d0b7bd72a#tab-history

View solution in original post

0 Kudos
8 Replies
Lalegre
Virtuoso
Virtuoso
Jump to solution

Hey,

Could you please collect the dump following the next KB: VMware Knowledge Base .

From there we will be able to review what happened.

0 Kudos
kur_Andy
Contributor
Contributor
Jump to solution

Hi,

attached is the log file I created from the zdump.

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

Hey Andy,

It seems that the hp-ilo driver caused the issue, if you see the log is full of these lines:

  • Failed to open file 'hpilo-d0ccb15

Also in the crash response you can see that is mentioning the IPMI:

  • #PF Exception 14 in world 2098575:IPMI Respons IP 0x418032922ad0 addr 0x18.

Also if you read on the lines is got the backtrace of the PCPU4 which was managing the failing instructions:

  • 2020-10-11T07:37:45.406Z cpu4:2098575)Backtrace for current CPU #4, worldID=2098575, fp=0x430c4a748ad0

And if you see below you can see that the PCPU4 was holding the IPMI process:

  • 2020-10-11T07:37:45.432Z cpu4:2098575)pcpu:4 world:2098575 name:"IPMI Response Processing"

Could you please run from the ESXi the next command to get the driver version of hpilo: esxcli software vib install | grep ilo

0 Kudos
kur_Andy
Contributor
Contributor
Jump to solution

Hi,

this is what I get with: esxcli software vib list | grep ilo

ilo                            670.10.2.0.2-1OEM.670.0.0.7535516     HPE     PartnerSupported  2020-08-06

Firmware and drivers of the ProLiant DL380 Gen10 are from SPP 2020.03.0

0 Kudos
Lalegre
Virtuoso
Virtuoso
Jump to solution

Hey,

That SPP has been removed for having vulnerabilities and HPE is recommending to upgrade to SPP 2020.03.2: https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_8ccab9e4b99047138c2e608f97#tab-history

I wanted to check which ilo vib comes in the HPE Custom ISO from VMware but the page is in maintenance. Also check the next drivers that are recommended for 6.7: https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX-6a25570f556a41b38d0b7bd72a#tab-history

0 Kudos
kur_Andy
Contributor
Contributor
Jump to solution

Ok, thank you.

I will update firmware and drivers to the latest SPP 2020.09.0 as soon as possible

Tarabee
Contributor
Contributor
Jump to solution

Recently experiencing this same problem. Is updating the firmware and drivers fix our problem? 

0 Kudos
zhang_py
Contributor
Contributor
Jump to solution

Hi, what is the command for querying the firmware and driver version of esxi system?

0 Kudos