We have a pair of high performance Dell R730 servers running 5.5 b3568722. These have Intel X540 10G NIC's connected to a SuperMicro based storage array, again with Intel 10G NIC's running Open-E DSS v7 (a certified VMware storage appliance). The array has a brand new LSI 9361 RAID controller with 1GB cache providing an 8TB RAID5 array with a 250GB LSI CacheCade v2 SSD cache. On paper, this should provide >30,000 IOPS. Benchmark testing in Open-E DSS for the storage shows ~960MB/s READ, ~600MB/s WRITE. The Dell servers have a point-to-point connection to the storage using CAT6A x-over cables (so, no switch involved). This is a single (Fixed) 10G path, so no MPIO. Storage is presented to the ESXi hosts over iSCSI. I've optimzed the ESXi and DSS iSCSI settings to the best performing settings, so from that perspective, everything looks ok. I'm running with Jumbo frames and an MTU of 9000. Without Jumbo frames, write latency is upwards of 250ms which just kills performance totally.
With Jumbo frames enabled, write latency is much better, but still peaks at around 30ms, which is still far from ideal. General performance is still shocking though - around 60MB/s write falling to 30MB/s on occasion when cloning from internal host storage to the external array. For a 10Gig link, this is appalling. Looking at disk stats in esxtop, KAVG/s is always 0, but DAVG/s (and thus GAVG/s) is hitting upwards of 30, which is the problem. As DAVG/s fluctuates lower, performance increases, but never really goes above 60-80MB/s.
If I clone from the external array to internal ESXi storage, performance is what I'd expect for what's available. DAVG/s and GAVG/s both stay at or below 1, and write performance to the internal RAID1 mirror on the ESXi host peaks at around 200MB/s, so it's only writes on the external array causing the issue I can see.
What I've tried;
Upgraded the Intel NIC driver from the OOB version to the current 4.4.1-iov
Setting both the ESXi host and DSS storage array to common identical initiator/target settings
Tried the Delayed Ack off on the ESXi host (no change, so set it back on)
Enabled LRO on the DSS array (the ESXi host reports LSO is not an available function)
Nothing really changes or affects performance I can see. There's no large jump in performance.
I know RAID5 isn't the best for write, but the raw benchmarks show what it should be capable of, and I'm only getting 10% of that.