-
15. Re: Write latency and network errors
rafficvmware Mar 13, 2018 7:55 PM (in response to MichaelGi)Did anyone got a solution for this?
-
16. Re: Write latency and network errors
rphoon Mar 23, 2018 7:19 AM (in response to MichaelGi)Just wondering if you have checked the upstream switches and MTU settings on all the vmkernel nics. Mismatch MTUs may cause reties and network inconsistencies.
-
17. Re: Write latency and network errors
InfiniVirt Mar 26, 2018 9:48 AM (in response to MichaelGi)We are having the very same issue, albeit much worse. We are seeing latencies surpassing 1400 ms ( ! ) on a relatively empty 12-node VSAN stretched cluster (SR# 18750505903). The link between sites is less than 30% used with >1ms latency. The issue was discovered when a SQL server w/ 1.5TB DB was migrated into the cluster and began having major application issues.
VSAN 6.2 , ESXi 6.0.0 3620759.
Cisco UCS C240M4 hardware with Enterprise-grade SAS SSD/HDDs.Cluster is completely symmetrical. Hosts consist on 2 disk groups of 8 disks. (1) 400GB Enterprise SAS SSD / (7) 1.2 TB 10K SAS HDD." VSAN HCL validated multiple times for incorrect drivers, firmwares and even hardware. All check out.
I'm not seeing any pause frames on the upstream UCS Fabric Interconnects. Flow Control is not configured either, nor does it appear to be configurable on the VIC 1227:
[root@-------vsan-06:~] esxcli system module parameters list -m enic
Name Type Value Description
----------------- ---- ----- -------------------------------------------------------------------------
heap_initial int Initial heap size allocated for the driver.
heap_max int Maximum attainable heap size for the driver.
skb_mpool_initial int Driver's minimum private socket buffer memory pool size.
skb_mpool_max int Maximum attainable private socket buffer memory pool size for the driver.
[root@-------vsan-06:~] ethtool -a vmnic5
Pause parameters for vmnic5:
Cannot get device pause settings: Operation not supported
Per KB2146267 I tried disabling the dedup scanner but this did not improve anything. I also updated the pNIC drivers and that didn't help either.
-
18. Re: Write latency and network errors
LeslieBNS9 Apr 2, 2018 7:19 AM (in response to MichaelGi)We are also seeing a lot of these errors on our All Flash vSAN environment. We've been doing some testing and think we have narrowed down the issue.
We have 6 hosts with the following configuration..
SuperMicro 1028U-TR4+
2xIntel E5-2680v4
512GB RAM
X710-DA2 10GB Network Adapters (Dedicated for vSAN, not shared)
Cisco 3548 Switches (Dedicated for vSAN, not shared)
We went through different drives/firmware on our X710, but so far none of that has made a difference.
We noticed on our Cisco switch that all of the interfaces connected to our vSAN were having discards on a regular basis (multiple times every hour). We opened a support case with Cisco to troubleshoot this and found that ALL of our vSAN ports have bursts of traffic that are filling up the output buffers on the switch. During these bursts/full buffers the switch discards the packets.
So I would check on your switches to see if you are having any packet discards.
At this point Cisco is recommending we move to a deep buffer switch. I spoke with VMWare support to see if there is a specific switch they recommend (or buffers), but they said they just require a 10Gb switch. I find this frustrating as we have 2 expensive switches we are only using 6 ports on and may not be able to add any more hosts to.
Ethernet1/2 queuing information:
qos-group sched-type oper-bandwidth
0 WRR 100
Multicast statistics:
Mcast pkts dropped : 0
Unicast statistics:
qos-group 0
HW MTU: 16356 (16356 configured)
drop-type: drop, xon: 0, xoff: 0
Statistics:
Ucast pkts dropped : 180616
Ethernet1/2 is up
Dedicated Interface
Hardware: 100/1000/10000 Ethernet, address: 00d7.8faa.cf09 (bia 00d7.8faa.cf09)
MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec
reliability 255/255, txload 2/255, rxload 4/255
Encapsulation ARPA
Port mode is access
full-duplex, 10 Gb/s, media type is 10G
Beacon is turned off
Input flow-control is off, output flow-control is off
Rate mode is dedicated
Switchport monitor is off
EtherType is 0x8100
Last link flapped 4d12h
Last clearing of "show interface" counters 3d23h
0 interface resets
Load-Interval #1: 30 seconds
30 seconds input rate 98177624 bits/sec, 4262 packets/sec
30 seconds output rate 124356600 bits/sec, 4302 packets/sec
Load-Interval #2: 5 minute (300 seconds)
input rate 163.09 Mbps, 6.20 Kpps; output rate 113.03 Mbps, 6.33 Kpps
RX
2620601947 unicast packets 5716 multicast packets 335 broadcast packets
2620612576 input packets 10625804438347 bytes
1353181073 jumbo packets 0 storm suppression bytes
0 runts 0 giants 0 CRC 0 no buffer
0 input error 0 short frame 0 overrun 0 underrun 0 ignored
0 watchdog 0 bad etype drop 0 bad proto drop 0 if down drop
0 input with dribble 0 input discard
0 Rx pause
TX
2619585440 unicast packets 0 multicast packets 2452 broadcast packets
2619587892 output packets 9072740199246 bytes
1162617883 jumbo packets
0 output errors 0 collision 0 deferred 0 late collision
0 lost carrier 0 no carrier 0 babble 180616 output discard
0 Tx pause
-
19. Re: Write latency and network errors
Great_White_Tec Apr 2, 2018 7:33 AM (in response to LeslieBNS9)For NIC issues, here is a typical checklist:
- Make sure the NICs are on the vSphere VCG
- Not only make sure that Firmware and Drivers are up to date (latest), BUT also that there are no mismatches
- Mismatches between these two have been know to cause some issues, in particular packet drops, based on my experience
- For the X710 (X71x & 72x) disabling LRO / TSO have resolved a lot of the issues encountered in the past.
- See Jase McCarty's script about this Vsan-Settings/Vsan-SetTsoLro.ps1 at master · jasemccarty/Vsan-Settings · GitHub
-
20. Re: Write latency and network errors
LeslieBNS9 Apr 2, 2018 8:09 AM (in response to Great_White_Tec)For the X710 (X71x & 72x) disabling LRO / TSO have resolved a lot of the issues encountered in the past.
We are aware of the LRO/TSO errors and the firmware/driver version recommendations for the X710's and have already been through all of those settings.
-
21. Re: Write latency and network errors
LeslieBNS9 Apr 2, 2018 8:11 AM (in response to Great_White_Tec)Also all of our hardware is on the HCL and has matching drivers/firmware.
I actually posted another thread specific to my issue at All-Flash vSAN Latency & Network Discards (Switching Recommendations)
I just wanted to give the poster here some reference in case they are seeing the same thing we are seeing.
-
22. Re: Write latency and network errors
InfiniVirt Apr 3, 2018 1:14 PM (in response to InfiniVirt)Thanks LeslieBNS9. I believe we are experiencing similar causation.
Instead of uplinking our UCS servers directly to switches first they connect to the Fabric Interconnects 6248s, which then uplink to Nexus 7010s via (2) 40GE vPCs. The Fabric Interconnects are discarding packets as evidenced by "show queuing interface" on all active vSAN interfaces. The manner in which we have vmnics situated in VMware (Active/Standby) Fabric B is effectively dedicated to VSAN traffic, and the cluster is idle so not a bandwidth issue or even contention, rather the FI's scrawny buffer assigned to custome QoS System Classes in UCS not able to handle bursts. We have QoS configured per the Cisco VSAN Reference doc. Platinum CoS is assigned qos-group 2, which only has a queue/buffer size of 22720! NXOS in the UCS FIs is read-only so this is not configurable.
I will probably disable Platinum QoS System Class and assigning VSAN vNICs to Best Effort so we can at least increase the available queue size to 150720
Ethernet1/1 queuing information:
TX Queuing
qos-group sched-type oper-bandwidth
0 WRR 3 (Best Effort)
1 WRR 17 (FCoE)
2 WRR 31 (VSAN)
3 WRR 25 (VM)
4 WRR 18 (vMotion)
5 WRR 6 (Mgmt)
RX Queuing
qos-group 0
q-size: 150720, HW MTU: 1500 (1500 configured)
drop-type: drop, xon: 0, xoff: 150720
qos-group 1
q-size: 79360, HW MTU: 2158 (2158 configured)
drop-type: no-drop, xon: 20480, xoff: 40320
qos-group 2
q-size: 22720, HW MTU: 1500 (1500 configured)
drop-type: drop, xon: 0, xoff: 22720
Statistics:
Pkts received over the port : 256270856
Ucast pkts sent to the cross-bar : 187972399
Mcast pkts sent to the cross-bar : 63629024
Ucast pkts received from the cross-bar : 1897117447
Pkts sent to the port : 2433368432
Pkts discarded on ingress : 4669433
Per-priority-pause status : Rx (Inactive), Tx (Inactive)
Egress Buffers were verified to be congested during large file copy:
show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg
The following command reveals congestion on the egress (reference):
nap-FI6248-VSAN-B(nxos)# show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg
Slot 0 Carmel 0 register contents:
Register Name | Offset | Value
car_bm_STA_frh_eg_addr_0 | 0x50340 | 0x1
car_bm_STA_frh_eg_addr_1 | 0x52340 | 0
car_bm_STA_frh_eg_addr_2 | 0x54340 | 0
car_bm_STA_frh_eg_addr_3 | 0x56340 | 0
car_bm_STA_frh_eg_addr_4 | 0x58340 | 0
car_bm_STA_frh_eg_addr_5 | 0x5a340 | 0
car_bm_STA_frh_eg_addr_6 | 0x5c340 | 0
car_bm_STA_frh_eg_addr_7 | 0x5e340 | 0
nap-FI6248-VSAN-B(nxos)# show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg
Slot 0 Carmel 0 register contents:
Register Name | Offset | Value
car_bm_STA_frh_eg_addr_0 | 0x50340 | 0x2
car_bm_STA_frh_eg_addr_1 | 0x52340 | 0
car_bm_STA_frh_eg_addr_2 | 0x54340 | 0
car_bm_STA_frh_eg_addr_3 | 0x56340 | 0
car_bm_STA_frh_eg_addr_4 | 0x58340 | 0
car_bm_STA_frh_eg_addr_5 | 0x5a340 | 0
car_bm_STA_frh_eg_addr_6 | 0x5c340 | 0
car_bm_STA_frh_eg_addr_7 | 0x5e340 | 0
nap-FI6248-VSAN-B(nxos)# show hardware internal carmel asic 0 registers match .*STA.*frh.* | i eg
Slot 0 Carmel 0 register contents:
Register Name | Offset | Value
car_bm_STA_frh_eg_addr_0 | 0x50340 | 0
car_bm_STA_frh_eg_addr_1 | 0x52340 | 0
car_bm_STA_frh_eg_addr_2 | 0x54340 | 0
car_bm_STA_frh_eg_addr_3 | 0x56340 | 0x1
car_bm_STA_frh_eg_addr_4 | 0x58340 | 0
car_bm_STA_frh_eg_addr_5 | 0x5a340 | 0
car_bm_STA_frh_eg_addr_6 | 0x5c340 | 0
car_bm_STA_frh_eg_addr_7 | 0x5e340 | 0
I should note we are not seeing discards or drops on any of the 'show interface' counters.
-
23. Re: Write latency and network errors
wreedMH Apr 11, 2018 3:35 PM (in response to InfiniVirt)Subscribing. I have same issues.
-
24. Re: Write latency and network errors
JimL1651 May 1, 2018 11:58 AM (in response to wreedMH)We're having the same issue on a new 12 node, all flash stretch cluster with raid-5 and encryption. Write latency is very high. We have support tickets open with Dell and VMware. We've done testing with hcibench and SQLIO using different storage policies. Raid 1 is better but still below what we consider acceptable.
The out of order packets were caused by having dual uplinks to two different top of rack switches. We resolved that by changing them active-passive instead of active-active. We'll convert to LACP when we get a chance. Networking is all 10gig with < 1ms latency between hosts and sites. Top of rack switches are Cisco Nexus 5K's and all error counters are clean. Using iPerf from the host shell shows we can easily push greater than 9gbit between hosts and sites with .5 to .6 ms latency.
-
25. Re: Write latency and network errors
pkonz Feb 11, 2019 10:56 AM (in response to LeslieBNS9)LeslieBNS9,
Did you ending up getting a deep buffer switch? We are having the same issue.
-
26. Re: Write latency and network errors
TolgaAsik Mar 10, 2019 3:41 AM (in response to pkonz)Hello All,
Same issue we are experiencing. Any update for solution?
My switches Nexus 5548UP, a lot of packets are discarding on the switch ports.
-
27. Re: Write latency and network errors
sk84 Mar 10, 2019 4:51 AM (in response to TolgaAsik)Meanwhile, you can read more and more about packet discards on the switch side in All-Flash vSAN configurations. The cause often seems to be the buffer on the switch side. VMware itself gives little or no information about which switch components to use because they want to be hardware independent and don't prefer a vendor. But in my personal opinion, most Nexus switches are crap for use in vSAN all-flash configurations, especially if they're over 5 years old and have a shared buffer.
However, John Nicholson (Technical Marketing vSAN) recently published a post on Reddit that summarizes some points to keep in mind (but it's his personal opinion and no official statement):
- Don't use Cisco FEX's. Seriously, just don't. Terrible buffers, no port to port capabilities. Even Cisco will telly you not to put storage on them
- Buffers. For a lab that 4MB buffer marvel $1000 special might work but really 12MB is the minimum buffer I wantt o see. IF you want to go nuts I've heard some lovely things about those crazy 6GB buffer StrataDNX DUNE ASIC switches (Even Cisco carries one the Nexus 36xx I think). Dropped frames/packets/re-transmits rapidly slow down storage. That Cisco Nexus 5500 that's 8 years old and has VoQ stuff? Seriously don't try running a heavy database on it!
- It's 2019. STOP BUYING 10Gbps stuff. 25Gbps cost very little more, and 10Gbps switches that can't do 25Gbps are likely 4 year old ASIC's at this point.
- Mind your NIC Driver/Firmware. The vSphere Health team has even started writing online health checks to KB's on a few. Disable the weird PCI-E powersaving if using Intel 5xx series NIC's. It will cause flapping.
- LACP if you use it, use the vDS and do a advanced hash (SRC-DST) to get proper bang/buck. Don't use crappy IP HASH only. No shame in active/passive. simpler to troubleshoot and failure behavior is cleaner.
- TURN ON CDP/LLDP in both directions!
- Only Arista issue I've seen (Was another redditor complaining about vSAN performance actually a while back we helped) was someone who mis-matched his LAG policies/groups/hashes.
Interfaces. TwinAx I like because unlike 10Gbase-T you don't have to worry about interference or termination, they are reasonably priced, and as long as you don't need a long run the passive ones don't cause a lot of comparability issues.
https://www.reddit.com/r/vmware/comments/aumhvj/vsan_switches/
-
28. Re: Write latency and network errors
TolgaAsik Mar 11, 2019 9:34 PM (in response to sk84)Thank you for answer.
Now we are working on the case with Cisco support. If I summary, they recommend us to apply the following steps:
My issue is huge ingress packet discarding by the switch.
- HOLB Mitigation: Enable VOQ Limit
- HOLB Mitigation: Traffic Classification
After applied the steps, I will inform you.
-
29. Re: Write latency and network errors
TolgaAsik Mar 15, 2019 6:48 AM (in response to MichaelGi)Still it continues, we applied QoS by using ACL, but we couldnt finalize the issue.