We are having the very same issue, albeit much worse. We are seeing latencies surpassing 1400 ms ( ! ) on a relatively empty 12-node VSAN stretched cluster (SR# 18750505903). The link between sites is less than 30% used with >1ms latency. The issue was discovered when a SQL server w/ 1.5TB DB was migrated into the cluster and began having major application issues.
VSAN 6.2 , ESXi 6.0.0 3620759.
Cisco UCS C240M4 hardware with Enterprise-grade SAS SSD/HDDs.Cluster is completely symmetrical. Hosts consist on 2 disk groups of 8 disks. (1) 400GB Enterprise SAS SSD / (7) 1.2 TB 10K SAS HDD." VSAN HCL validated multiple times for incorrect drivers, firmwares and even hardware. All check out.
I'm not seeing any pause frames on the upstream UCS Fabric Interconnects. Flow Control is not configured either, nor does it appear to be configurable on the VIC 1227:
[root@-------vsan-06:~] esxcli system module parameters list -m enic
Name Type Value Description
----------------- ---- ----- -------------------------------------------------------------------------
heap_initial int Initial heap size allocated for the driver.
heap_max int Maximum attainable heap size for the driver.
skb_mpool_initial int Driver's minimum private socket buffer memory pool size.
skb_mpool_max int Maximum attainable private socket buffer memory pool size for the driver.
[root@-------vsan-06:~] ethtool -a vmnic5
Pause parameters for vmnic5:
Cannot get device pause settings: Operation not supported
Per KB2146267 I tried disabling the dedup scanner but this did not improve anything. I also updated the pNIC drivers and that didn't help either.