I just had the exact same experience as you with an almost identical hardware configuration. I found the only way to stop the high ingress packet loss was to change the default distributed port group teaming and failover settings from "Route based on originating virtual port" to "Use explicit failover order" with a defined Active uplink and all others as Standby.
HP Servers, multiple brands of switches (tried during testing/troubleshooting), multiple brands of NICs (ending with Intel x520/x540), and didn't seem to be hardware/driver related.
Thanks, I'll give it a try and see how it goes.
I know it has to be a switch/hardware thing, as I have the exact same config VMware wise at work for our massive vSAN on vxRail clusters and the counters flatline at 0% day in, day out!
(The only difference is Dell server kit vs the HPE I have, and Cisco network gear versus my Dell)
Surely there has to be a logical solution, or some simple setting we've forgotten to configure on the switches?
Were you experiencing actual issues such as latency or user complaints? We see the packed loss percentages of 600% as well but do not know if it correlates to actual problems.
Is it 600% (per-cent) or 600%o (per-mille)?
What is your virtual/physical network configuration?
Got exactly the same issue here, I've set the VSAN dSwitch to Use Explicit failover, no change
Using Intel X552 (Onboard on SuperMicro X10 Mb) on a netgear 8 port 10Gbe switch.
The first step I would take, is looking at the Firmware AND driver for the NICs. There is no NIC option on the vSAN VCG, but you should be looking at the vSphere VCG for this. It is VERY important to have the Firmware AND driver at the same level. This is known as FW/Driver combination, and a mismatch here will most likely cause you grief (I see this almost weekly...sadly). What you need to avoid is having the latest driver and a firmware that is 2 years old. The vSphere VCG will point you to the combination recommended.
The other problem I've been running into lately is the switch. Remember you will be pushing storage through whatever switch you have, If you wouldn't place traditional storage traffic through the switch you have vSAN on, you probably shouldn't be using it for vSAN traffic either. Be aware of switches with high port-to-port latency, and low buffers. I understand that networking is often a blackbox for vSphere/Storage admins, but please take it into consideration, and size properly according to your needs.
After upgrading my home lab recently I also have high inbound packet loss counter and TCP retransmits.
NICs: Intel x520 10GB
Switch: Mikrotik 10GB
Troubleshooting so far. Upgrade firmware and driver on NIC for one of the hosts, have not seem to help.
I have a colleague that has the same problem with his homelab and he is running HP proliant servers.
NICs: Some broadcom variant i think.
Did anyone of you guys solve it? Found this thread also https://www.reddit.com/r/vmware/comments/93fbuw/massive_vsan_latency_increase_on_upgrade_to_67/
Any help with where to continue troubleshooting is appriciated.
Just an update on this - this is a known issue with fix incoming, also fairly sure public documentation is in the works and I will keep an eye on that for updates.
So that you are aware: this is cosmetic and non-impactful, If you were experiencing that rate of loss it would be very apparent from performance issues (+ if you were getting packet drops due to network issues/misconfiguration they likely wouldn't be observed in just one place e.g. pNicRxPortDrops).