What was the benchmark you expected? In other words, assuming that storage was local.
4K random IOPS isn't bad at all if it's an HDD array. Best disks provide ~170 random IOPS per drive and that doesn't scale up linearly by adding disks to the array.
If the array's SSD-based it's an entirely different story of course. There you'd usually have the SAS controller as the main bottleneck.
Also, depending on whether you benchmark a bare server or a VM you would notice a difference at these speeds. The limit if you used VMware storage stack is about 30K random IOPS for many threads combined, either read or write, and, suprpisingly regardless if the disks were RDM or VMFS. If your SRP/iSER initiators are in VM-s that is reportely a lot faster, but we're yet to test it.
I wonder how it fared in the end. We've just finished testing a monster DAS configuration and are about to netowk it (iSER or SRP over QDR IB.)
The card -> card latency of 3us sounds about right for IB, however, theres more to it than that. 0.25ms is 250 microsends, so besides the raw roundtrip time (around 6-10 usec on your switches, you've got 245microseconds to explain. To be honest, 0.24ms is fantastic times for moving data through a storage system - well under whats normally achieved (consider that topend FC arrays costing millions filled with SSDs have trouble doing better than 2-3ms).
There are definitly some interrupts happening, and theres the time needed for the data to move from main memory, through the kernel, get wrapped in IB frames, thrown onto the wire, etc.
Honestly, I dont think anything is wrong....
VCP, VCDX #52, Unix Geek, Storage Nerd
Assuming it's SSD-s, and the array is not RAID6 or RAID5 and that the storage boxes are not caching the requests, the best latency you get off a single SLC SSD is in the range of 100us. RAIDing them in RAID10 or RAID0 won't improve single-thread latency but improves combined performance for multiple threads.
That extra 150us might indeed be explained by wrapping/unwrapping traffic if initiators are in VM-s behind hypervisors, or within hypervisors, which in this case we don't know.
What's more interesting though is, in DAS scenario there is no difference between RDM disks and VMFS storage made of the same SSD-s. Bare server performance on 16 Intel X25-E SSD-s in RAID10 is about 300K/80K read/write IOPS random on 4K blocks, while it's not more than 30K/30K for the VM-s running in VMware hypervisor on the same server off the same array - combined for multiple threads running off a few VM-s.