Solved: Re: NIC teaming + load balancing with HP Blades, V...

PabloOttawa · ‎05-26-2009

Hi,

I am configuring an ESX infrastructure running on HP Blades (c7000 enclosure), VirtualConnect, and Cisco 3750 switches for the uplink. My network configuration is based on the HP VirtualConnnect Cookbook, Scenario 11.

On the ESX server, I have configured a vSwitch with two teamed network cards. Each card can see a different VirtualConnect network. Each VC network has SmartLink activated.

Each VirtualConnect network is associated with two ports on an VC bay. Those two ports are the uplinks to the Cisco 3750 switches. The Cisco switches are configured in a stack, and each port associated to the VC network is configured with LACP aggregation.

Here are the facts:

- Each network (Host-A and Host-B) is properly configured for LACP, that is, the links are shown as Active/Active in VC manager.

- Communication from ESX is good.

- Failover and failback work OK.

The problem: I can't get load balancing to work. All VMs use a single pNIC. I have tried different algorithms on the ESX server (source port ID, mac hash, IP hash) and configuring the equivalent on the Cisco side.

I have attached a diagram of the physical network as well as the configuration for the ports.

What am I missing? Many thanks,

Pablo

kjb007 · ‎05-29-2009

Since you're technically not creating a channel from an ESX perspective, you don't want to use IP hash. If ESX tries to send traffic through both NICs, the cisco switch should drop that packet as part of its loop avoidance algorithms. You should be able to check that in the switch port statistics. You will need to use switch port ID.

Not sure about the promiscuous mode. That should not have any bearing, as far as load balancing is concerned, unless if the portgroup was not properly inheriting the vSwitch properties?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

kjb007 · ‎05-26-2009

To load balance, the only algorithm you can use is IP hash. After that is configured, you only load balance different src-dst IP combinations. Meaning, if all NICs are active, then your vm to IP1 on your network should use NIC1, and your vm to IP2 on your network should use NIC2.

Are all your NICs active on the ESX host? If your virtual connect is in Bay5, then your matching NIC should be in mezz 2. This would give you 2/4 NICs on ESX? How is your virtual networking configured? Can you post a screenshot of that?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

PabloOttawa · ‎05-26-2009

Thanks KjB for the reply;

I have tried IP Hash load balancing configured on the vSwitches and the Cisco switches; To test, I have 2 VMs copying files to two different external hosts:

VM1: 172.16.49.70 -> 172.16.49.12

VM2: 172.16.49.71 -> 172.16.49.13

Let's call this two-to-two. I have tried other combinations like one-to-two and two-to one, and in all cases only a single adapter is taken.

My question is: Usually it works like a charm when the ESX server NICs are directly connected to the switch - I have done this on Cisco and Nortel switches. However, I have the VirtualConnect switches in the middle. Are you aware of any specific settings that should be considered?

Thanks,

Pablo

kjb007 · ‎05-26-2009

Looking back at your network diagram, I think I see the problem. You have two different channels. A channel used by HostA network, and a channel used by HostB network. Your diagarm suggests you are configuring an interface from your blade to each network, so you are not really creating a channel from an ESX perspective. Try pointing both NICs to one network, and see if you start using both NICs.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

PabloOttawa · ‎05-26-2009

I haven't tried that yet, but the problem with that configuration (if I point both nics to the same network) is that if my Virtualconnect Bay fails, the whole thing goes down. I will have a single point of failure.

The creation of two Virtualconnect networks connected to two bays and LACP trunks has been done following the documents "HP VirtualConnect Cookbook" - scenario 11 and "HP VirtualConnect for the Cisco Administrator" - Advanced Design example #4.

Cheers,

P.

kjb007 · ‎05-26-2009

Agreed, the configuration is correct, but what you have done in that case is create a channel by which virtual connect can use either interface it has available to it to talk to the network, thereby making both connections active. Otherwise, your vConnect would be active/standby. What you have not done is created the same configuration from an ESX perspective. The problem here is that you can not channel across multiple vConnect bays, which is what you would need to do to use both bays concurrently as you are trying to use. To team in this fashion for ESX, both interfaces from ESX perspective have to be in the same channel group, which these are not.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

JarrettCampbell · ‎05-26-2009

Have you tried it without LACP? Can you do regular Etherchannel accross those switches?

Using: channel-group mode on instead of active

Set NIC teaming to IP HASH

Also- take out your spanning tree config, I dont believe it is necessary.

PabloOttawa · ‎05-27-2009

Hi KjB, please see my comments below. Thanks for your feedback!

>>Agreed, the configuration is correct, but what you have done in that case is create a channel by which virtual connect can use either interface it has available to it to talk to the network, thereby making both connections active. Otherwise, your vConnect would be active/standby.

R: Correct, this has been achieved by creating the LACP tunnels.

>>What you have not done is created the same configuration from an ESX perspective. The problem here is that you can not channel across multiple vConnect bays, which is what you would need to do to use both bays concurrently as you are trying to use.

R: Correct, I can't channel across multiple vCOnnect bays. But the channeling is to agregate bandwidth to the uplink, not to team the ESX nics.

>>To team in this fashion for ESX, both interfaces from ESX perspective have to be in the same channel group, which these are not.

R: That's why HP instructs to create two discrete VirtualConnect networks. Each discrete network will have an uplink on its own Interconnect Bay, and from there to the switches. From the ESX point of view, it should be as two different interfaces; at least, the Originating Port ID load balancing should work.

I attached a copy of the example found on the document "HP VirtualConnect for the Cisco Administrator".

KjB, what would be your suggestion? What would you change on the design to make it work?

Many thanks,

Pablo

PabloOttawa · ‎05-27-2009

Hi Jarret,

Thanks for your answer; without LACP, I will lose the ability of having aggregated bandwidth. On the other hand, regular Etherchannel doesn't work - understanding regular Etherchannel as the Cisco config we usually do - as virtual connect does not support PAGP (Cisco protocol) but supports LACP instead.

Anyways it is a valid suggestion to try just to see if LACP and NIC teaming are mutually exclusive, so I removed LACP and tried. Unfortunately it didn't work (actually, I lost all connections).

Thanks for your feedback, keep the ideas flowing!

Pablo

kjb007 · ‎05-27-2009

I would forcibly verify that both interfaces (from esx perspective) work correctly. Have vm's on the ESX server, with both NICs available. Set your policy to 'route based on src port id' Remove one NIC from the team, make sure that side works. Then, add that NIC back in, and remove the 2nd NIC, and make sure that path works. If that path does work, then both paths are verified. To balance the vm's across both NICs, make sure both NICs are available. If you are only getting through one one NIC still, then you should check STP, and make sure spanning tree is not blocking any of your switch ports.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

PabloOttawa · ‎05-29-2009

Hey KjB, please see my comments below:

>>I would forcibly verify that both interfaces (from esx perspective) work correctly. Have vm's on the ESX server, with both NICs available. Set your policy to 'route based on src port id' Remove one NIC from the team, make sure that side works. Then, add that NIC back in, and remove the 2nd NIC, and make sure that path works. If that path does work, then both paths are verified.

Yes that was working OK, it was one of the first things I verified.

>> To balance the vm's across both NICs, make sure both NICs are available. If you are only getting through one one NIC still, then you should check STP, and make sure spanning tree is not blocking any of your switch ports.

I think that was part of the trick. I configured the port switches with "spanning-tree bdpu-guard enable".

Now, the most strangest part: Even after the changes above, that wasn't working.

So I created another port group on the vSwitch and connected two VMs to the first port group and two VMs to the second. And all off the sudden, some load balancing started to happen. Three quarters of the traffic was going through a NIC while the other quarter was taking the second.

I thought it was something related to macaddresses, so I went to the vSwitch configuration and changed the security parameters to Promiscuous mode - Reject, leaving the two others to accept. And voilà, it started spliting the traffic equally among ports. Why would that be?

On a final note, what LB algorithm should I use? IP-Hash is the best, however Etherchanneling is not configured on the Cisco side; only static LACP is configured, and that's to create the aggregated trunk from each VirtualConnect bay; and I can't trunk those together as well. So I was thinking Port ID. Thoughts?

Thanks for your help, again,

Pablo

kjb007 · ‎05-29-2009

Since you're technically not creating a channel from an ESX perspective, you don't want to use IP hash. If ESX tries to send traffic through both NICs, the cisco switch should drop that packet as part of its loop avoidance algorithms. You should be able to check that in the switch port statistics. You will need to use switch port ID.

Not sure about the promiscuous mode. That should not have any bearing, as far as load balancing is concerned, unless if the portgroup was not properly inheriting the vSwitch properties?

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

PabloOttawa · ‎06-01-2009

Thanks very much for your advice KjB. I have switched the load balancing mode to PortID, will see how it goes.

Cheers,

Pablo

kjb007 · ‎06-01-2009

You're welcome. Don't forget to leave points for helpful / correct posts.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

All

NIC teaming + load balancing with HP Blades, VirtualConnect, and Cisco