VMware Cloud Community
asp24
Enthusiast
Enthusiast

Slow read from 10Gbit SAN connection, 1Gbit ok

I have a strange problem with my test environment.

I have a Starwind ISCSI SAN with 10Gbit and 1Gbit links to my Cisco 3750E stack.

The test esx host have one 10Gbit direct connection to the dualport 10Gbit card on the SAN (no switch), and some regular 1Gbit link (-> 3750E -> SAN)

Waiting for more X2's to run 10Gbit via the switch. But most hosts will run 1Gbit anyway.

When testing I get 600-700 Mbytes/s using the directly connected 10Gbit connection (not even running jumbo frames), 120 Mbytes/s using 1Gbit -> SAN 1Gbit path.

The problem is when using a path from a 1Gbit link on the esx host to the 10Gbit switch-connected link on the SAN. I get 120Mbytes/s writes, but the reads are very slow! 20-40 Mbytes/s

Any ideas? I have tried enabling receive flowcontrol on the SAN interface (3750 does not support TX flowcontrol), changing jumboframes etc.I have tried swapping CX4 cables and ports.

The problem seems to be the transition from 10Bgit to 1Gbit speed. Buffer problems?

Btw: I have a second SAN also connected to the same stack using 10Gbit link. Using iperf I get 9Gbit/s transfer between the SAN-servers.

The NICS are Intel 82598EB (AOC-STG-I2)

10gb.png

show interfaces tenGigabitEthernet 1/0/1

int.png

0 Kudos
3 Replies
asp24
Enthusiast
Enthusiast

OK! This is strange!

I disabled QOS on the stack (conf t , no mls qos) and I got normal reads now.

I have not configured ANY QOS parameters after enabling it (actually to troubleshoot the same problem), so the default QOS settings on the stack when enabling it messed with the reads.

0 Kudos
rlund
Enthusiast
Enthusiast

I generally recommend QOS configuration on the vlan ISCSI is on. This makes sure the ISCSI data has priority over other traffic.

Roger lund

Roger Lund Minnesota VMUG leader Blogger VMware and IT Evangelist My Blog: http://itblog.rogerlund.net & http://www.vbrainstorm.com
0 Kudos
asp24
Enthusiast
Enthusiast

ISCSI is on its own VLAN.

I have identified the real problem now.

The Catalyst 3560/3750-switches have very small buffers. This is no problem on same-speed interfaces, but when transfering from 10gig interfaces to 1gig interfaces the switch needs buffers to handle the transition in speed (queueing).

The solution (thanks to a lot of discussions I found doing some more research) was to do some modifications to the queueing.

I made these modifications:

mls qos
mls qos queue-set output 1 threshold 2 3200 3200 100 3200
mls qos queue-set output 1 threshold 3 3200 3200 100 3200
mls qos queue-set output 2 threshold 2 3200 3200 100 3200
mls qos queue-set output 2 threshold 3 3200 3200 100 3200

This fixed/reduced the problem.

0 Kudos