VMware Cloud Community
1q2w3e4r
Contributor
Contributor

Troubleshooting massive vmnic interface discards

I've just checked my Solarwinds Top 10 and found that my five ESXi 5 servers are generating up to 500 Million (!) receive discards per day each on the vmnic interfaces (and 0 transmit discards).

I have HP Blades with Flex Fabric connected to Nexus (all 10Gb), although the Blades NICs share their 10Gb between 3 ethernet NICs and 1 HBA.

Can anyone recomend where to begin to figure out why this is happening?

Tags (1)
0 Kudos
29 Replies
ehurst
Contributor
Contributor

We have the same problem.  Numerous BL465c G7 blades running ESXi 5.0 and a couple ESXi 5.0 hosts on DL300 series servers.   On the blades… we trunk 8 flex nics over two 10-GB Virtual Connect Flex10 uplinks to our Cisco 6509 core.   Solarwinds shows 10s of millions of receive discards for those interfaces.  0 transmit discards.  None of the DL300 series hosts show those receive discards.

0 Kudos
1q2w3e4r
Contributor
Contributor

I've been doing some testing and my non-ESX blades are all BL460 G7's (Red Hat and Windows), but none of them suffer the same issue. So i'm guessing it's the NC551i NIC or the ESX 55577 driver. ESX 5 is officially not supported with Flex so that be part of the reason. I had other flex issues previously which forced me to upgrade to non-standard firmware and drivers for the NC551i. If I get some time, I'll try downgrading and see if it makes any difference.

0 Kudos
silverline
Contributor
Contributor

I just enabled monitoring for my ESX hosts tonight in Solarwinds and noticed this behavior also.

I am using supported Cisco B-series blades with Palo NICs.

Will open a TAC case and see if it gets me anywhere...

0 Kudos
Six9s
Contributor
Contributor

Does anyone have an update on this thread? I too am experiencing this issue. BL490C G7 hosts on FlexFabric interconnects. All firmware and drivers current as of May24th.

0 Kudos
NattyG
Contributor
Contributor

We are having the same issue while using SolarWinds, did anyone get a fix for this.

We have updated our firmware and drivers to the latest version as much as possible and we are still getting them.  I do notice that HP say they support esxi 5 in the documentation, but when you go to download the firmware on the website there is no mention.

We did find turning off the following on the C7000 chassis helped for a little while and stopped the VC getting discards, but now we are getting them on vmnic0.

We turned off:Network loop protection and all our x5 x6 and inter stacking link discards disappeared.  But now as mentioned get them on the vmnic's of vmware itself.

We know its not a network or VC issue, seems to be pointed squarely at vmware for this one.

Makes no sense.

0 Kudos
kastlr
Expert
Expert

Hi,

please check the output of the following command

     esxcli storage core adapter list

If it does contain similar entries like this

vmhba3    lpfc820      link-up     fc.xxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxx  (0:4:0.2) ServerEngines Corporation Emulex OneConnect OCe10100 10GbE, FCoE UCNA

vmhba4    lpfc820      link-up     fc.xxxxxxxxxxxxxxxx:xxxxxxxxxxxxxxxx  (0:4:0.3) ServerEngines Corporation Emulex OneConnect OCe10100 10GbE, FCoE UCNA

you should check the following articles.

Slow virtual machine performance when using Emulex OneConnect Converged Network Adapter

General Guidelines for Optimizing IO workloads

And if you're using VLAN's, there's the following ESXi 5.0 known issue.

When using the vSphere 5.0 NIC driver with Emulex UCNAs in an HP Flex-10 or IBM Virtual Fabric Adapter (VFA) environment, connectivity may not work properly on Windows virtual machines or on the server when VLANs are configured.

Workaround
Do not use the NIC driver bundled with vSphere 5.0.
Obtain an updated driver from Emulex, HP, or IBM that supports HP Flex-10 or IBM VFA systems.

This statement is taken from the following document.

Emulex Drivers for VMware Release Notes

Kind regards,

Ralf


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
NattyG
Contributor
Contributor

Thanks for the reply, not sure what the storage has to do.

But we are running qlogic fibre adapters and Emulex network drivers.  We have just installed the latest version of the Emulex drivers and the qlogic drivers, but still no difference.

Emulex: firmware. 4.1.402.8

Emulex: driver: 4.1.334.48

Qlogic: 911.k1.1.-26OEM

We have also just installed the latest version 10.10 firmware disk from HP.

Thanks

0 Kudos
kastlr
Expert
Expert

Hi,

HP Blades with Flex Fabric could use FCoE Adapters.

If you doesn't use HP Blades with FCoE adapters, this information will be irrelevant for you.

Regards,

Ralf


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
silverline
Contributor
Contributor

Just checking back in on this.

I am still seeing the same behavior without any resolution along with multiple other weird behaviors regarding network traffic.

I have gone through all of the performance and troubleshooting guides and have not gotten anywhere.  I have also upgraded our UCS firmware through the entire 2.0 code and we are now on 2.0(3)

The one thing that I have seen in the guides which I initially thought might be the issue was high CPU utilization.  I am not sure how to rule this out, but I would hope that if the CPU on the host is capable of running the VMs without any spiking issues it should be able to handle the network.  But maybe I am wrong...

I have tried messing with the different NIC settings also to no avail.  I started a discussion here:  http://communities.vmware.com/message/2044097#2044097 to try and get people talking about these advanced settings and see if anyone has had any luck with combinations but it didn't really go anywhere.

I imagine that this is a problem that many people have but just do not notice without proper monitoring.  Maybe it doesnt matter that these packets are dropping or it is a cosmetic thing.  I don't know.  It would be really nice if someone from VMWare could offer some guidance here outside of the short section in the best practice guide that blames high CPU as the main cause.

Has anyone opened a support ticket on this issue with them yet?   We bought our ESX host licenses through Cisco and TAC just wanted me to get a packet capture of the traffic in question - which didnt go anywhere because we receive too much traffic and these discards arent continuous.  Matching up the discards with a live packet capture on ths host is near impossible.

0 Kudos
NattyG
Contributor
Contributor

Hi yes I should of been clear what we use, only use standard flex-10 10Gb for Ethernet only and the qlogic for San connection.

0 Kudos
NattyG
Contributor
Contributor

I have raised a ticket, but the support seems similar with HP, so not putting any faith in a solution. But u never know.

0 Kudos
NattyG
Contributor
Contributor

Hi,

Just to give you all a update, been working with vmware regarding this discard issue and we have found the problem to be with the Emulex drivers, we have currently a driver on our environment and after installing this on 3 servers we have NO discards, the other 5 have hundreds of millions, even if nothing on them.

From this vmware have submitted information to Emulex so hopefully they will produce a new driver soon with the update.  The problem seems to be that the driver is reporting incorrect information which is causing the problem.

Thanks

0 Kudos
Six9s
Contributor
Contributor

NattyG- Thanks for the update. I really appreciate your efforts to get to the bottom of this issue.

Keith

0 Kudos
kastlr
Expert
Expert

Hi,

this result sounds nearly identical to what I already mentioned earlier.

And this is already documented in the Emulex Drivers for VMware Release Notes on Page 3.

ESXi 5.0 Known Issues

  1. When using the vSphere 5.0 NIC driver with Emulex UCNAs in an HP Flex-10 or IBM Virtual Fabric Adapter (VFA) environment, connectivity may not work properly on Windows virtual machines or on the server when VLANs are configured.

Workaround

Do not use the NIC driver bundled with vSphere 5.0.
Obtain an updated driver from Emulex, HP, or IBM that supports HP Flex-10 or IBM VFA systems.

So this isn't a new issue, regardless of wath VMware support mentioned.

Regards,

Ralf


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
silverline
Contributor
Contributor

That's great that you guys have a solution.

I don't have hundreds of millions of discartds like he is seeing, but I still am seeing tens of thousands of discards per day on my Cisco interfaces.

The Cisco documentation I have read for recent versions recommends using the built in VMWare drivers.

I have tried custom drivers a long time ago and they didn't seem to have any impact either.

Anyone have a document for Cisco NICs with a similar fix for me?

0 Kudos
NattyG
Contributor
Contributor

Thank you for your post, that problem you highlight in regards to the vlan is and was a issue, it would cause connectivity issues when you vlan off the management network and the vmotion and other vlans for different requirements or networks, this was noticable and caused issues with people trying to use these guests on the those particulare hosts.

The updated driver and firmware fixed it.

What we are seeing is discards, in the hundreds of millions, this in itself can cause performance issues but what we have found was that the emulex driver is incorrectly reporting discards when in fact there are none, so all our monitoring tools are going mad for no reason.

This is not causing the same problem as orginally you mention, as the driver I was using with vmware was older than the fix emulex gave for the vlan issue, this is separate issue.

0 Kudos
NattyG
Contributor
Contributor

Hi, can you send us the port configuration and also if the discards are inbound or outbound of the switch.  And explain a little on how it is all setup so we can help.

0 Kudos
kastlr
Expert
Expert

Hi,

which driver did you use?

Only a few days ago VMware does provide a new driver for Cisco fnic cards.

VMware ESXi 5.0 Driver for Cisco fnic (Ver. 1.5.0.8), released 2012-06-25

Maybe it's worth a try if you aren't running on latest.

Regards,

Ralf


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
NattyG
Contributor
Contributor

Thank you for the email, we are running Emulex network cards.

0 Kudos