Strange round-robin behaviour?

Matilda · ‎09-17-2013

Hi all,

I've just recently upgraded our SAN's in the data center to the HP DL380e G8 with the P420 controller running RAID10 (1GB FBWC) and 10x1TB Seagate drives. It works a treat and read speeds are around 700-800mb/sec on the system benchmarks (running Open-E DSS V7). I've set up all 4 NIC's with ip's (10.1.1.x, 10.1.2.x, 10.1.3.x and 10.1.4.x). On the ESXi cluster we have, I've set up 4 VMKernel's on each of the ESXi servers in the cluster with the corresponding 10.1.1.x, 10.1.2.x, 10.1.3.x and 10.1.4.x ip addresses. It's running iSCSI with round robin path selection enabled and I'm running LACP NIC bonding on the ESXi hosts with 2 x 1 gigabit cards for 2gbit throughput potential (ip hash is enabled). When doing a speed test (eg dd if=/dev/zero of=/test.file bs=8k count=1024k) inside a guest (8gb file), it will max out at 98mb/sec on only 1 of the network cards on the ESXi host however the SAN shows the load spreading evenly between all 4 NIC's. So it looks like round robin is working on the SAN but on ESXi it's only pumping all that data via one card?

Is there something I am doing wrong? I have tried setting the IOPS setting for round robin to different values as well for a test but still only 1 of the cards seem to pump all the data. I can see a bit of data going via the 2nd card (about 20mbit) from the other guests which means that it is putting some of the traffic via the 2nd one but I thought that it would balance evenly across both links?

I've attached the cacti graphs of the SAN (NAS1) - you can see the nice even load across all links for round robin, but check out the ESX4 host - that was when I was transferring a guest to the new datastore earlier today.

Cheers, Mike

Matilda · ‎09-17-2013

I've just done some further testing..

for each vmkernel I set it to the following:

10.1.1.x - vmnic0

10.1.2.x - vmnic1

10.1.3.x - vmnic0

10.1.4.x - vmnic1

instead of each vmkernel using both vmnic0 and vmnic1.. that seems to work so might just leave it on that config

admin · ‎09-18-2013

Hello,

This behavior is because recommended configuration is to have only one vmnic for each VMkernel port and move the others to unused.

and then port bind the vmkernel ports for the fail over. The below documents should be helpful to understand the setup better.

http://kb.vmware.com/kb/1008083 < Configuring and troubleshooting basic software iSCSI setup >

http://www.vmware.com/files/pdf/techpaper/vmware-multipathing-configuration-software-iSCSI-port-bind...

Thanks,

Avinash

All

Strange round-robin behaviour?