VMware Cloud Community
ben_turner_
Contributor
Contributor

HP FlexFabric 20Gb 2-port 650FLB - Gen9 networking inconsistency

I have come across an interesting issue with a new HPE platform. The system is running within a C7000 BladeSystem, with BL460c Gen9 blades.

We have noticed some degradation in performance on iSCSI connection (using the Software iSCSI initiator), this traffic runs over vmnic1 and vmnic2 details from the NIC list are below.

vmnic1  0000:06:00.1  elxnet  Up        Up       10000  Full32:a6:05:e0:00:be  1500  Emulex Corporation HPE FlexFabric 20Gb 2-port 650FLB Adapter
vmnic2  0000:06:00.2  elxnet  Up        Up       10000  Full32:a6:05:e0:00:bd  1500  Emulex Corporation HPE FlexFabric 20Gb 2-port 650FLB Adapter

Each NIC is reporting at 10000 Mb full, however I am not able to set the speed on the ESXi server. vmnic1 reports the following for advertised link modes;

[root@ESX:~] esxcli network nic get -n vmnic1

   Advertised Auto Negotiation: true

   Advertised Link Modes: 1000BaseKR2/Full, 10000BaseKR2/Full, 20000BaseKR2/Full, Auto

   Auto Negotiation: true

Where as vmnic2 reports the following modes

[root@ESXi2b-14:~] esxcli network nic get -n vmnic2

   Advertised Auto Negotiation: false

   Advertised Link Modes: 20000None/Full

   Auto Negotiation: false

Confused, the settings are identical for these within OneView. Both NIC's are using firmware - 12.0.1110.11 from SPP 2018.06.0. The HPE ESXi image has been used including driver version 12.0.1115.0 which shows as being compatible on the comparability guide VMware Compatibility Guide - I/O Device Search.

Has anyone else seen this issue? If I try and manually set the speed/duplex settings via esxcli it fails with the following error in the vmkernel.log

2018-08-14T23:49:41.361Z cpu20:65677)WARNING: elxnet: elxnet_linkStatusSet:7471: [vmnic2] Device is not privileged to do speed changes

As a result of this when using HCIBench to test the storage throughput the 95%tile_LAT value is reading excessively when traversing vmnic2 - 95%tile_LAT = 3111.7403 ms

Any thoughts??

Reply
0 Kudos
22 Replies
SupreetK
Commander
Commander

Interesting Smiley Happy Can you share the complete output of the below commands?

esxcli network nic get -n vmnic1

esxcli network nic get -n vmnic2

Cheers,

Supreet

Reply
0 Kudos
ben_turner_
Contributor
Contributor

Sure thing.

[root@ESX:~] esxcli network nic get -n vmnic1

   Advertised Auto Negotiation: true

   Advertised Link Modes: 1000BaseKR2/Full, 10000BaseKR2/Full, 20000BaseKR2/Full, Auto

   Auto Negotiation: true

   Cable Type:

   Current Message Level: 4631

   Driver Info:

         Bus Info: 0000:06:00:1

         Driver: elxnet

         Firmware Version: 12.0.1110.11

         Version: 12.0.1115.0

   Link Detected: true

   Link Status: Up by explicit linkSet

   Name: vmnic1

   PHYAddress: 1

   Pause Autonegotiate: true

   Pause RX: true

   Pause TX: true

   Supported Ports:

   Supports Auto Negotiation: true

   Supports Pause: true

   Supports Wakeon: true

   Transceiver: external

   Virtual Address: 00:50:56:59:d7:63

   Wakeon: MagicPacket(tm)

[root@ESX:~] esxcli network nic get -n vmnic2

   Advertised Auto Negotiation: false

   Advertised Link Modes: 20000None/Full

   Auto Negotiation: false

   Cable Type:

   Current Message Level: 4631

   Driver Info:

         Bus Info: 0000:06:00:2

         Driver: elxnet

         Firmware Version: 12.0.1110.11

         Version: 12.0.1115.0

   Link Detected: true

   Link Status: Up by explicit linkSet

   Name: vmnic2

   PHYAddress: 0

   Pause Autonegotiate: true

   Pause RX: true

   Pause TX: true

   Supported Ports:

   Supports Auto Negotiation: false

   Supports Pause: true

   Supports Wakeon: false

   Transceiver: external

   Virtual Address: 00:50:56:58:05:51

   Wakeon: None

Really hoping that this isn't something simple that I have missed.

Thanks, Ben.

Reply
0 Kudos
ben_turner_
Contributor
Contributor

I also tried to set the interface to 10Gb Full via esxcli;

esxcli network nic set -n vmnic2 -S 10000 -D full

It failed as expected;

2018-08-15T10:26:55.023Z cpu17:68364 opID=e4ebaba5)Uplink: 14445: Setting speed/duplex to (10000 FULL) on vmnic2.

2018-08-15T10:26:55.024Z cpu47:65677)WARNING: elxnet: elxnet_linkStatusSet:7419: [vmnic2] Speed 10000 is not supported on this phy interface (0xc)

I have a case open with HPE on this too, interesting indeed.

Reply
0 Kudos
SupreetK
Commander
Commander

Per my understanding, below could be the issue here -

esxcli network nic get -n vmnic1          

  Bus Info: 0000:06:00:1 --> PF 1

esxcli network nic get -n vmnic2

  Bus Info: 0000:06:00:2 --> PF 2

 

In case of a multi-channel mode, same physical port will be shared among multiple PFs. PF-1 could be the primary PF and PF-2 could be treated as non-primary PF.

Emulex firmware might not be allowing the non-primary PFs to modify the port level settings such as auto-negotiate, etc.

This is to avoid multiple PFs choosing different settings which is not possible since, the physical port is same. And this is why we might be seeing the below error in the logs -

2018-08-14T23:49:41.361Z cpu20:65677)WARNING: elxnet: elxnet_linkStatusSet:7471: [vmnic2] Device is not privileged to do speed changes

Good that you have already involved HPE on this. I would be very eager to know what they have to say about this Smiley Happy

Please consider marking this answer as "correct" or "helpful" if you think your questions have been answered.

Cheers,

Supreet

Reply
0 Kudos
ben_turner_
Contributor
Contributor

Thanks for the input so far Supreet.

In our case vmnic1 and vmnic 2 will be using 2 different physical ports, as they are leaving the chassis via different interconnects.

Still chasing HPE with this, sending a nice collection of log files over to them for this now. I'll keep you posted with their response.

Cheers, Ben.

Reply
0 Kudos
SupreetK
Commander
Commander

Ahh! Will be eagerly waiting to know how this pans out Smiley Happy

Cheers,

Supreet

Reply
0 Kudos
a_p_
Leadership
Leadership

Just guessing.

  • Which network interconnect modules (type/model) do you use in the C7000 chassis?
  • You mentioned "SPP 2018.06.0". Has Virtual Connect already been updated to firmware 4.62 (if applicable)?
  • Do both ports - to which the BL460c is connected - have the same VC profile assigned?

André

Reply
0 Kudos
ben_turner_
Contributor
Contributor

Morning André

  • We have HP VC FlexFabric-20/40 F8 Module installed in the C7000.
  • Yes, the firmware is running 4.62.
  • The ports have different profiles, assuming I am reading OneView correctly. The profiles are identical with the exception of the member ports in the uplink sets. One profile is using interconnect 1 X8 where as the other profile is using interconnect 2 X8.

Cheers, Ben.

Reply
0 Kudos
ben_turner_
Contributor
Contributor

I have had an interesting development in this, I thought I would share.

HPE are working on this now, I don't expect there to be a resolution to this any time soon though.

We are using the HPE customised ESXi 6.5 U2 image, which includes elxnet driver version 12.0.1115.0.

   Driver Info:

         Bus Info: 0000:06:00:2

         Driver: elxnet

         Firmware Version: 12.0.1110.11

         Version: 12.0.1115.0

Running this version of the driver, the NIC doesn’t list all the correct speeds at advertisement.

[root@ESXi1a-21:~] esxcli network nic get -n vmnic2

   Advertised Auto Negotiation: false

   Advertised Link Modes: 20000None/Full

   Auto Negotiation: false

Although other NIC's on the same host display the correct speed advertisements.

[root@ESXi1a-21:~] esxcli network nic get -n vmnic1

   Advertised Auto Negotiation: true

   Advertised Link Modes: 1000BaseKR2/Full, 10000BaseKR2/Full, 20000BaseKR2/Full, Auto

   Auto Negotiation: true

If I install ESXi 6.5 U2 via a direct download from VMware, this installs elxnet driver version 11.1.91.0.

   Driver Info:

         Bus Info: 0000:06:00:2

         Driver: elxnet

         Firmware Version: 12.0.1110.11

         Version: 11.1.91.0

Running this version of the driver, the NIC doesn’t list all the correct speeds at advertisement.

[root@localhost:~] esxcli network nic get -n vmnic2

   Advertised Auto Negotiation: false

   Advertised Link Modes: 20000None/Full

   Auto Negotiation: false

If I use the HPE 6.0 U3 image this installs elxnet driver version 12.0.1115.0 which exhibits the same issue as the 6.5 U2 image.

Now for the interesting part. If I install ESXi 6.0 U3 natively from the VMware website elxnet driver version 10.2.309.6v is included.

   Driver Info:

         Bus Info: 0000:06:00:2

         Driver: elxnet

         Firmware Version: 12.0.1110.11

         Version: 10.2.309.6v

This driver version reports the correct available speeds.

[root@localhost:~] esxcli network nic get  -n vmnic2

   Advertised Auto Negotiation: true

   Advertised Link Modes: 1000baseT/Full, 10000baseT/Full, 20000baseT/Full

   Auto Negotiation: false

Nothing else has changed at all on the system, other than the ESXi image that has been used.

I'm curious if anyone else has ever come across this issue, it seems to be a potential driver issue but I don't understand how, if this is a driver issue it hasn't been noticed in the past.

Reply
0 Kudos
SupreetK
Commander
Commander

Very interesting Smiley Happy What if you install the latest VMware native driver on 6.5 U2? Does the issue persist? This is just to isolate if it is a problem with all the versions of elxnet async driver.

Cheers,

Supreet

Reply
0 Kudos
ben_turner_
Contributor
Contributor

Certainly does Supreet.

I even for the sake of playing devils advocate installed 6.7, and the native driver in there also exhibits the same problem.

HPE are due to get access to lab hardware today/tomorrow to start replicating. I'll keep you posted!

Reply
0 Kudos
SupreetK
Commander
Commander

Would love to know the end of this Smiley Happy Thank you for keeping us posted.

Cheers,

Supreet

Reply
0 Kudos
Skins4ev4
Contributor
Contributor

I am seeing the same thing, let me know what you find out.  This is driving me nuts not being able to have a consistent host profile.  FYI I am using:

   Driver: elxnet

         Firmware Version: 12.0.1110.11

         Version: 11.4.1205.0

Reply
0 Kudos
ben_turner_
Contributor
Contributor

Interesting, good to hear that we are not alone with this issue. HPE have gone very quite on this one at the moment, will keep the thread up to date though as and when updates come through.

Reply
0 Kudos
ben_turner_
Contributor
Contributor

I have some progress from HPE!

They have now been able to replicate the fault and have acknowledged that this could well be a driver issue Smiley Happy

The issue has now been escalated from the L2 engineers to the L3 engineers for further testing. They have also said that they will be looking for other customers that have reported this issue globally. If anyone has this issue, please log a support request with HPE, drop me an email/message on the VMTN and I'll pass you the incident to reference this with HPE as well so they can tie them together.

Cheers, Ben.

Reply
0 Kudos
Futuzz
Contributor
Contributor

Hello I have this same issue but we do not use iscsi, instead we use fc. We experience disconnections from our redundant paths to our SAN and only the hosts with the hardware from the title are affected. Any updates from HP?

Reply
0 Kudos
ben_turner_
Contributor
Contributor

Still very much a work in progress with HPE at the moment. Still pushing them, latest is they need to work with VMware and the hardware vendor with the potential of a new driver to be developed.

Interesting to hear that I am not the only one seeing this issue. If you have the ability to log this with HPE then more cases with the same issue with strengthen the case.

I will keep this thread up to date with anything useful though when/if it arises.

tvanholland
Contributor
Contributor

Any word from HPE on this? We are running into the same issue. Interesting little twist to what we are seeing is that if we put a significant amount of load on vmnic2 or vmnic3 the links will drop completely. HPE hasn't been able to pin down the issue for us yet.

Reply
0 Kudos
ben_turner_
Contributor
Contributor

I have had some feedback, but nothing of any significance. They then also proceeded to close my case, despite my request for more time to test this out and produce more evidence as while there seems to be some merit to the details below - it doesn't answer why I see the latency.

--------------------------------------------------------------------------------------------------------------------------------------------------
This behaviour is expected in case of multi-channel modes.

The same physical port will be shared among multiple logical functions in case of a multi-channel mode.

For example, Port #A is associated with even numbered logical functions (i.e. 0,2,4,6, etc).
and Port #B is associated with odd numbered logical functions (i.e. 1,3,5,7, etc.).

Emulex Firmware design is such that, only primary logical functions (i.e. logical function 0 for Port #A and logical function 1 for Port #B) are privileged to modify Port level features like PortSpeed, Autoneg, etc..

This is to avoid multiple logical functions choosing different settings which is not possible since the physical port is same.

That is the reason that the driver is not advertising negotiation for those non-primary logical functions.

---------------------------------------------------------------------------------------------------------------------------------------------------

I have pulled a server from the cluster and intend on doing so further testing on this. However, recent workload has prevented focus on this issue but I am hoping to have some time for testing on this in the next week or so.

Reply
0 Kudos