Solved: Re: Link aggregation (iscsi over 2 physical nics)

mwaterhouse · ‎07-29-2008

Hi All,

Could really do with some help if anyone could be so kind...

We have esx installed on a blade server which has two nics dedicated to iscsi, using IP hash load balancing in vmware.

within a virtual machine i have two luns mapped, one (luna) to filer 192.168.x.221, one (lunb) to a filer 192.168.x.222

If I copy a file from the vm root to luna and lunb at the same time and look at the traffic on the san switch (2*cascaded cisco 3750s) the traffic from the copy is all going over just one nic, Im a bit new to vmware and esx but since the destination is a different IP, should the traffic go through both nics instead of just one?

The san switch and two blade switches configs are attached

Any help would be much appreciated!

kjb007 · ‎09-01-2008

What you are seeing is correct. The way the load balancing will work, is for every connection or transaction, you will use one NIC. Meaning, for one source and one destination, you have one src-dst combination, which will balance over one NIC alone. When you add a second destination, a second src-dst combination, you will use a different NIC, and so on. No one src-dst combination will utilize both NICs.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

mwaterhouse · ‎07-29-2008

sorry, forgot to mention, theyre mapped via RDMs

BenConrad · ‎07-29-2008

Unfortunately, this is the expected behavior for SW iSCSI. See http://communities.vmware.com/thread/51177

Note: this is how it works on Equallogic storage and we use Qlogic iSCSI HBAs exclusively.

What his happening is that the iSCSI VMKernel IP is passing all iSCSI traffic, all the VMs are all going through the VMKernel IP and there are not really any virtual 'ports' on the vSwitch. Since all traffic is from 1 IP and since that IP address only logs into a volume 1 time none of the load balancing methods work to actually load balance. I was suprised that even after adding multiple SAN volumes (multiple TCP/IP connections) that traffic was only going over 1 interface on the pHost, using Hash balancing. In that case, both VMFS/RDM volumes ended up on the same Eth port on the SAN, nullifying any use of Hash.

In my testing the path will failover on link down but it won't failover if the switch port hangs or if the upstream device (target) is not available. I've not been able to get Beacon Probing working properly

If you use iSCSI HBAs you can achieve poor mans load balancing and the upstream failure path failover works properly.

Ben

kjb007 · ‎07-29-2008

You are using a port channel, but you'll need to make sure that you turn off negotiation and set port channel to on. Meaning, dynamic aggregation is not supported, it must be set statically.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

mwaterhouse · ‎07-30-2008

Hi again,

Thanks very much for the replies!!!

@BenConrad

Thanks, That thread is now quite old though. Maybe Im being naive but surely VMWare has addresses this issue at some point. Ive read quite a few threads on the internet that claim they have this working (although im not sure of their validity). Sorry if im totally grabbing the wrong end of the stick here but we have a physical server here which simply has teaming software to team two physical nics and the traffic throughput on the two nics are almost the same, indicating that the traffic is balanced across them both. If we can do this on a cheap server in windows surely its possible on vmware somehow isnt it?

@kjb007

Thanks for that, will look into it... If you have the time (and patience... ) Id be much appreciatiative if you could elaborate a little on your meaning

In general....

I didnt restart the esx hosts after changing the load balancing to IP hash, i have now done this and see traffic on both the physical nics when copying to the two different filers so Im getting there, however it would be nice to get throughput on both nics on a single copy operation, although im guessing this wont be achievable as the ip hashes will be the same.

Thanks again!

BenConrad · ‎07-30-2008

I think you've achieved 100% of what the SW iSCSI will support. Since you have >1 target IPs (multiple SANs) the hash LB works. You will be able to perform load balancing on a 'per SAN' level. If we ran SW iSCSI here we would be at the mercy of the Equallogic TCP connection load balancing algorithm to spread the initiator to multipe Eth ports (IPs).

kjb007 · ‎07-30-2008

What I meant is that you are trying to use a port channel with ESX. ESX will use 802.3ad link aggregation, but it can't sync with cisco using LACP dynamically. When you configure a team on ESX, and set both NICs to active, and use IP hash load balancing, then ESX is ready to aggregate. On the physical switch side, the switch is expecting some negotiation to take place to know both sides are ready to aggregate links. This does not happen, because ESX does not dynamically negotiate. Basically, you'll need to turn negotiate off, and turn the port channel mode to on. That should have the physicals witch in the same mode as ESX, and you should start to see both switch ports start to be used.

You also have the 2nd part worked out already by having more than 1 IP for your targets, so you should effectively be using both pNICs and both switch ports.

-KjB

Message was edited by: kjb007 : fixed typo

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

mwaterhouse · ‎07-31-2008

Sorry, I need to clarify, the port channel is configured between the blade uplink side of the switch and the 3750s, there is no port channel on the blade pnic side of the switch. Ive attached a diagram which I hope will clarify. Basically the two 3020 blade swicthes do not have a physical link between them so they wouldnt negotiate into any port channel would they. Does this make sense? Please accept my full apologies for any lack of understanding and thanks again for your patience!

kjb007 · ‎07-31-2008

Ok, if the port channel is from the blade switch to the distribution switch, then you're not aggregating from the esx side, only on the uplinks.

My question then is, on the ESX side, how are these connected to the blade switches? Are these in-chassis switches? If they are, you may want to check spanning tree, and see if ports are getting blocked. This is common from blade switch to external switch configs. Also, if you are going to separate switches, then you're not using port channels, and you shouldn't use src-dst-ip algorithm, and stick to port-based.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

mwaterhouse · ‎08-01-2008

Yes, were only aggregating the uplinks, not the actual ports that the esx is pluged into.

They are on in-chassis switches. As far as I can tell no ports are getting blocked. The dashboard clearly shows traffic to and from multiple ports on the blade uplinks and the corresponding ports on the external switch and a show spanning-tree blockedports shows no ports blocked.

With regards to the last part, so basically what you are saying is that you cant use source-destination unless you are using port channels on the ESX side of the blade switches?

Thanks again.

kjb007 · ‎08-01-2008

Yes, you'll find that you run into issues if you try src-dest-ip using interfaces going to multiple switches. If you are going to the same switch, and are using that algorithm, your load balancing will be 1-sided. ESX will attempt to balance load, and will do that on the send side, but the switch, without being configured in a port channel, will not do any load balancing. So, send will try to be balanced, but receive will not, and that will cause problems down the road, if not immediately.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

mwaterhouse · ‎08-22-2008

Sorry for the delayed reply, Ive been on anual leave.

I see your point now thanks,

So basically id need the two ports in a port channel which is not possible since the ports are on different switches.

Obviously the 3020s are not capable of stackwise so that is not an option. So my next question is there any way to connect the swicthes together to achieve this or any other way i can get receive traffic load balanced across the nics.

Thanks again for your patience and time!

mwaterhouse · ‎08-28-2008

strange thing is Ive just looked at some servers we have with teamed nics and the traffic is balanced in both directions and there isnt even any special config on the switch. Why should VMware be any different as that too is effectively just a teamed nic....

kjb007 · ‎08-30-2008

How are you seeing the traffic over the two NICs? Are you seeing load in the send direction balanced, how about the receive? Do you see packets going out both interfaces? I ask because you also mention that you are testing using a copy process between LUNs, and those LUNs are also mapped via RDM.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

mwaterhouse · ‎09-01-2008

Im using the web interface of the switches along with the sent and transmitted packets. Ive attached screenshots....

Strange, Ive done some further testing and in a multiple copy operation (copying two files to two different luns on different filers) and it is indeed using two nics

Ive done the reverse (copy two files from the two different filers back to the VM) and this also appears to have two send and two receive so it appears to be working although im not too sure

The luns are mapped via RDMs BTW... Correct

Thanks again,

mwaterhouse · ‎09-01-2008

Oh, and Im guessing the apparent low utilisation is because it is showing the utilisation of the port group (i.e. approx 13% as it is an eight of the 8 port channel.) could anyone please confirm this?

kjb007 · ‎09-01-2008

What you are seeing is correct. The way the load balancing will work, is for every connection or transaction, you will use one NIC. Meaning, for one source and one destination, you have one src-dst combination, which will balance over one NIC alone. When you add a second destination, a second src-dst combination, you will use a different NIC, and so on. No one src-dst combination will utilize both NICs.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

mwaterhouse · ‎09-03-2008

Ahh, OK.

Thanks a lot for all your help on this question. We can live with the current setup, was just clarifying