Mouhamad
Expert
Expert

HP Proliant Servers Important Update

PLEASE CHECK THE BELOW AND ACTION

http://bizsupport1.austin.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&cc=us&taskId=110&prodSe...

SUPPORT COMMUNICATION - CUSTOMER ADVISORY

Document ID: c02964542

Version: 5

Advisory: (Revision) HP ProLiant and HP StorageWorks Systems: HP NC375i, NC375T, NC522m, NC522SFP, NC523SFP, CN1000Q Network Adapters - FIRMWARE UPGRADE REQUIRED to Avoid the Loss and Automatic Recovery of Ethernet Connectivity or Adapter Unresponsiveness NOTICE: The information in this document, including products and software versions, is current as of the Release Date. This document is subject to change without notice.

Release Date: 2012-02-10

Last Updated: 2012-02-10

IMPORTANT : The network adapter firmware and driver upgrades provided in the Resolution are required to prevent the loss and recovery of Ethernet connectivity, or adapter unresponsiveness requiring a reboot to recover, from occurring. HP recommends performing these upgrades at the customer's earliest possible convenience. Neglecting to perform the recommended action and not performing the recommended resolution could result in the potential for subsequent errors to occur.

The HP network adapters listed in the Scope section (below) may encounter either of the following:

  • The adapter may temporarily lose Ethernet connectivity, and then automatically recover.

OR

  • The adapter may stop responding, requiring a server reboot to recover the operation of the adapter.

Note: There is a low probability of this occurring when operating under a normal network worklo

VCP-DCV, VCP-DT, VCAP-DCD, VSP, VTSP
0 Kudos
29 Replies
ICT-Freak
Enthusiast
Enthusiast

After the upgrade to the latest drivers and firmware versions we still see network lost redundancy errors and all the other warnings described in the HP document.

We use the following versions:

ethtool -i vmnic4
driver: nx_nic
version: 4.0.602
firmware-version: 4.0.579

Anyone else still having promblems after ugprading the firmware and drivers to the latest versions?

0 Kudos
peetz
Leadership
Leadership

We are running exactly this driver/firmware combination with NC522SFP adapters on vSphere 4.1 since about two months (with 6x ProLiant DL380G6).

Only once we had a "firmware hang" of the NIC in one host resulting in a complete loss of network connectivity. Luckily, no more problems so far.

I say luckily , because when I opened a case with HP regarding this issue an HP engineer told me that many customers still have problems with this latest firmware, and that HP and QLogic are currently working on fixing this in the next firmware version. He couldn't tell me when we can expect that next version.

I recommend that you open support cases with both VMware and HP regarding your issue.

- Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
Mouhamad
Expert
Expert

Hello,

HP advised that this issue will not be resolved with the firmware update or the driver. The issue is in the chipset itslef and the NIC cards and motherboard needs to be replaced with a revised hardware version.

HP are replacing my NICs and board soon, I will update you in case this solves my issue.

Regards,

VCP-DCV, VCP-DT, VCAP-DCD, VSP, VTSP
0 Kudos
peetz
Leadership
Leadership

Now, that's intersting ...

Do you have any information on how to identify the faulty chipsets/motherboards? By a specific range of serial numbers or ...?

What servers and what NICs are you using?

Thanks

Andreas

Twitter: @VFrontDe, @ESXiPatches | https://esxi-patches.v-front.de | https://vibsdepot.v-front.de
0 Kudos
Mouhamad
Expert
Expert

I see in the advisory that those NICS are having the issue:

Advisory: (Revision) HP ProLiant and HP StorageWorks Systems: HP NC375i, NC375T, NC522m, NC522SFP, NC523SFP, CN1000Q Network Adapters

As for me, I'm facing the issues on  NC375i, NC375T which HP are replacing.

I'm not really sure how to identify the chipset fault, but I'm sure that HP are aware of it now.

VCP-DCV, VCP-DT, VCAP-DCD, VSP, VTSP
0 Kudos
markzz
Enthusiast
Enthusiast

I'm not sure how HP will replace your NC375i as these are integrated on the main board or in fact with the DL58x g7 series they are on the main system riser..

The 3 series are on the main board..

Are you saying they are replacing the NC375i and therefore riser with a new riser with a later revision of the NC375i integrated on it. ??

0 Kudos
Gooose
Enthusiast
Enthusiast

Hi there,

We are also running the HPDL580 G7s which utilize the NC375i (quad card)

For the past 12 months since we started using HP hardware we intermittently experience an issue where the host stops responding.  After looking through the vmkernel logs contained in the /var/log directory the following error message is displayed:

vmkernel: 11:07:44:22.369 cpu6:4355)<3>nx_nic[vmnic0]: Firmware hang detected.

After several discussion with HP they asked that I upgraded to the latest driver/firmware, of which I'm running the below:

NIC Driver version: 4.0.602

Firmware version: 4.0.579

Even though we are now running the latest versions, we continue to see this intermittent outage.

Earlier this week I decided enough was enough and had an extensive conversation with HP.  I wanted them to send me two dual cards which would allow me to bypass the need for the quad card entirely.  Unfortunately they would not honor this request and asked for more logs to be sent from the server which had recently failed.

Finally a positive result!!!!  HP responded advising that they had analyzed the logs and indicated that they had seen some issues with the NC375i in the SPI board.  This was the reason we are seeing the intermittent network issues caused by the 'firmware hang'.  As a resolution, HP are now sending us new SPI boards which apparently have a new version of the NC375i card integrated into them, which have been rigorously tested.  HP have also suggested that the new board has proven to rectify the firmware hang issues.

I'm currently waiting to take delivery of the SPI boards, at which point I plan to install it into one of our ESX hosts and check for stability prior to proceeding with the installation on my other hosts.

I'll keep you posted as to whether finally my issue is resolved!

0 Kudos
Mouhamad
Expert
Expert

This is exactly what I have reached at the end with HP. Apparently, this will resolve the problem.

My SPIs has been replaced 3 weeks ago with no issues reported yet.

VCP-DCV, VCP-DT, VCAP-DCD, VSP, VTSP
0 Kudos
Gooose
Enthusiast
Enthusiast

Hi Mouhamad,

So hopefully this should resolve my issue as well, seeing as you haven't experienced any issues in the last 3 weeks.

All the best

0 Kudos
Gooose
Enthusiast
Enthusiast

Hi Mouhamad

Just curious as to whether you have experience any issues since replacing the SPI cards.

I'm just waiting delivery of my new cards from HP.

thanks

0 Kudos
Mouhamad
Expert
Expert

Hello there,

No issues at all, the only thing you need to worry about it your iLO configuration (because it will reset) and the BIOS config it will go back to default.

Good luck!

VCP-DCV, VCP-DT, VCAP-DCD, VSP, VTSP
0 Kudos
Gooose
Enthusiast
Enthusiast

Hi Mouhamad

Thanks for the quick response, I'm glad to hear it has resolved your problem.

Can I just confirm exactly what drivers you are running?

We are on the latest which is

NIC Driver version: 4.0.602

Firmware version: 4.0.579

Thanks

0 Kudos
markzz
Enthusiast
Enthusiast

I have also been fighting this issue for about 12 months now although our issue was a little different.

I have only ever used the onboard port for the Service Console and vMotion activity.

When heavily utilised the onboard NC375i ports definitly drop or appear to suffer link loss, but then so do the other QLogic NIC's..

More of an issue for us has been the NC523SFP cards which carry NFS and guest traffic, also the NC375T's which carry traffic for some unique segment requirements.

I have had a call logged with both VMWare and HP on intially an issue with the NC523SFP regading link loss and an apparent card firmware hang.

This call has been on going for about a month now.

VMWare really could not define any issue with ESX but HP did eventually conceed there are some odd issues with the qLogic cards both NC523 and the NC375t. In both cases these issues had been addressed with the latest firmware... As I pointed out the previous 2 releases also resolved these issues and apparently had truely failed to do so..

After much to and fro activity with support, we replaced the cards with a later hardware release of the NC523. Unfortunatly this did not solve anything..

Again some more sending of logs and hours on the phone with HP 1st and 2nd level support guys.

In the end HP could not resolve this issue or make any reasonable suggestion as to how we could resolve it.

HP have now agreed to loan me 2x NC552SFP which will replace the NC523SFP and 2x NC365T which will replace the NC375T.

I'm happy to say I have now had 4 days of error free no link loss, no card hangs no anything..

In the 12 months these DL585G7's have been running this is the first time I've not seen logged link loss or hang issues in a 4 day period.

Oh the other DL585G7's which still have the original nic configuration have experienced issues during this 4 day period.

Of course it's been 4 days but so far so good..

OH and even though I've not altered the 375i (so the integrated qLogic Nic's) they have been stable.

I have for instance vMotioned some 200GB of running system (memory foot print etc) over these nic's and no failed vMotions.. Again this is a first for these servers..

I must admit I'm not sure why the stability of the onboard nic's has improved but the change is incredibly obvious..

I'm now going back to HP to discuss our purchase or swap out options to replace both the NC523SFP's and the NC375T's.

I want to see this QLogic rubbish gone..

0 Kudos
markzz
Enthusiast
Enthusiast

An update for those interested..

Approx. 1 month operating with the new NIC's..

NO issues.

I've actually ordered some new servers this month..

I've not ordered any NC523 or NC375t's in the bundle..

NC552 and NC365T have replaced them..

I have a case open with HP regarding the on board NIC's

0 Kudos
Gooose
Enthusiast
Enthusiast

Hi Markzz

I've been running the new SPI board for a couple of weeks now without any issues so hopefully this has resolved the problem.

Will update you again if I do experience any further issues.

0 Kudos
markzz
Enthusiast
Enthusiast

Thanks for the update Gooose.

Regarding the onboard NC375i NIC's

I've sent HP quite a lot of logs. Initially they were very keen to chase the issue but have gone quite in the past week.

Maybe it's my turn to chase them up..

What routine did you follow to have them supply you the new riser..

0 Kudos
Gooose
Enthusiast
Enthusiast

Hi Markzz,

Initially I sent logs to them from the failing hosts, and also made sure I was running the latest firmware and NIC drivers.

I eventually got an email from their support saying that they was aware of an issue and basically it went from there.

My only advice to you is to keep pestering them everyday and tell them that you want the new SPI boards without having to submit any further logs.  I also raised it as a complaint to get the escalation increased.

Let me know how you get on.

0 Kudos
MikeS1983
Enthusiast
Enthusiast

Hi All,

Thought I'd just say I've experienced exactly the same issue with our HP servers. This is also affecting our physical Windows servers. I have also had the battle with HP regarding replacements. However they have agreed to send replacement SPI cards.

My VM's are connected to vswitches which spans across both the onboard and an additional PCIe nic cards so it's not been a big issue for us. However we are getting frequent alerts regarding connectivity to the onboard nics.

They sent a replacement out for one of our hosts. I replaced it nearly two months ago and so far so good....

Good luck to everyone with this issue.

Mike.

0 Kudos
markzz
Enthusiast
Enthusiast

Hi Mike and Gooose

Are there any identifying details on the new SPI riser?

As far as I've been able to determine the SPI riser has a spare part number of SP#591199-001 and a hardware version of V.A03

I'm unsure as it has a bunch of number on it.

Could you both have a look at the replacement SPI risers for identifying numbers.

I'm trying to determine the new SPI version and part numbers.

I'm attaching a photo of the current failing SPI

Thank you for your assistance

0 Kudos