We are having a strange issue, which so far VMware/HP support has not been able to solve.
We have an environment where we are using 2 x 10GBps and 2 x 9Gbps adapters on a single DVS.
The DVS supports ESxi Management, vMotion, and Data traffic for the VMs.
We have Resource allocation turned on as well.
The blades are G7s.
Our problem is we are constantly seeing dropped received packets in this cluster. After a lot of research we thought it may be firmware/driver related and went down a whole path of updating our chassis, hosts, drivers, etc to the latest supported config by HP.
Still we are getting dropped received packets. We can recreate it quite easily, by initiating a vMotion.
We've tried running on just the 2 x 10gig and just the 2 x 9gig adapters but this made no difference.
Anyone got any suggestions?
This is how they are configured:
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
reliability 255/255, txload 1/255, rxload 1/255
Port mode is trunk
full-duplex, 10 Gb/s, media type is 10g
Beacon is turned off
Auto-Negotiation is turned off
Input flow-control is off, output flow-control is off
>Still we are getting dropped received packets. We can recreate it quite easily, by initiating a vMotion.
Please be more specific on this point. Where exactly do you see those dropped packets (check (r)esxtop)? On the physical uplink of the destination receive-side (%DRPRX) or on the physical uplink/vMotion vmkernel port group of the source host (%DRPTX) or somewhere else? About what magnitude of dropped packets are we talking here, is this just an observation or is it actively causing issues? vMotion can be quite bursty, heavily utilizing a link so it's not odd if you see "a few" dropped packets.
Does the issue occur between hosts within the same chassis, thus never leaving the VC modules and ruling out physical switches?
Do you have an idential MTU set on all systems, vSwitches, vmkernel ports? Jumbo frames on one end could cause the drops apart from general issues such as bad frame CRC sum.
I doubt it's flow-control related, as ESXi will by default only enable flow control for a NIC if the physical switch (or VC module) agrees via auto-negotiation too.
You can check the actual flow control settings on the ESXi side via ethtool –show-pause vmnicX. Enabling flow control on the physical switch ports shouldn't hurt though.
Hi Mkguy, I work with ncolyer so I've decided to respond We noticed when looking at the performance monitor counters that the received dropped packets increment up. I've also been able to capture the received dropped packets by running esxtop in batch mode and then using esxplot. If I were to just look at a default active session of esxtop for %DRPRX I don't see anything that would indicate dropped packets. Example of dropped packets ( Virtual machine with 4 GB of RAM) Counter before vmotion # 312714 Counter after vmotion # 325272 This occurs within the chassis or from chassis to chassis. We have idential MTU set on all systems, vswitches, and vkernel ports. Here is the flow control setting Pause parameters for vmnic21: Autonegotiate: off RX: on TX: on Thanks, Brad
we experience exactely the same issue. Got a lot of RX packet drops in our ESXi environment as soon as there is some net traffic on any vmnic. The way with vMotion works also to reproduce the issue. We also have a case open with HP. The forwarded to Emulex. We are still waiting for an answer.
We experience the problem on all AMD Based G7 Blades with a NC551i (Emulex) Nic installed. BL685c G7 and BL465c G7. It does not matter if used in a Flex10 environment or with Passthrough module. All constellations drop packets. G6 or Intel machines also in same enclosures do not drop RX packets at all.
OS: ESXi 5.0 Build 821926
System: BL685c G7 BIOS A20 08/15/2012
Driver: be2net 4.1.334.0
FW Nic: 4.1.450.16
Did you get any further with this issue?
No we haven't had any further traction on this yet.
We have our case open with HP and they've given us a few things to try to try and isolate it further. We hope to do this in the next few weeks.
Any progress on your side? Our case number with HP is CASE:4685467385, if you wan to reference it in any of your support calls with them as well.
I really think HP should be recreating this issue internally as it doesn't seem to hard to demonstrate. We use vMotions to prove the dropped packets go up on the vMotion NICs during every vMotion.
Any chance you are able to provide me your HP Case number so i can pass it on to our HP Support engineer. I just mentioned your issue to our HP support engineer, and he can't look at it without it. I'm hoping that HP can find the answer quicker if they have multiple cases.
HP and Emulex made some research.
We got a new firmware and beta driver for the be2net CNA adapters today and did some basic testing.
Driver: be2net 4.2.327.0 (beta)
The problem IMHO is partly solved. The RX packet drops drastically decreased, but didn't disappear completely. During vMotion peaks we still got some RX packet drops.
According to HP the RX packet drops were not real drops but a miss indication from the driver. But cross checking with a network analyzer we still discovered any retransmits. So i guess there were of course some dropped RX packets 😉
The new firmware and driver should be released soon to public.
Internesting, but be2net driver version "4.2.327.0" was already released back in September:
Are you sure that's the correct version numbering?
we received those and are explicitly marked beta (although it is the same version as from vmware):
The VIB contained in the offline bundle I linked has the exact same version name/build number:
I suspect it's a completely identical driver. Can you post the md5/sha1 hash of your VIB file for comparision?
Here the output of following commands:
/vmfs/volumes/50d31899-d855d0d8-2533-10604bac98ea/Software # md5sum net-be2net-4.2.327.0-1OEM.500.0.0.472560.x86_64.vib
/vmfs/volumes/50d31899-d855d0d8-2533-10604bac98ea/Software # sha1sum net-be2net-4.2.327.0-1OEM.500.0.0.472560.x86_64.vib