VMware Cloud Community
SatishKumarSh
Contributor
Contributor

Packet loss with IP HASH- HELP!!!

Dear All,

We have 5 ESXi Server jonined a HA cluster. We want to use IP Hash based laod balancing. We have HP BL 680G7 Blade servers with HP GBE2C switch. We have 6 NIC and 2 vLAN needs to be communicated through same. Our 2xGBE2C switch are connected to Nexus 2x2248 switch which then connected to Nexus 2x5010 connected to 2xCISCO 6506 Core switch. The VLAN are defined on Juniper firewall which is conencted to Cisco core switch. there are 6 NIC connected to vSwitch0 and VM Network port group the policy is set to IP HASH. In this scenario we are having packet loss even with ping. We changed the load balancing at switch level and still the packet loss is coming. If we try to switch OFFany GBE2C switch in our blade chassis, all seems to work without any issue, e.g. no packet loss but when we switch ON both switch the packet loss issue is coming.

One more thing, if we set the load balancing to "Based on VLAN ID" then all seems to work well. We want to use IP HASH only as there only we achive true load balancing. Also We have got the configuration checked with VMware L2 and so many other experts and they all say all the config is as per requirement. We are using ESXi v4.1. Can someone help to isolate as we are stuck in this for last 1 month without a solution.

Thanks a ton in advance..

0 Kudos
9 Replies
weinstein5
Immortal
Immortal

Welcome to the COmmunity - With IP Baseed loead balancing you will need to configure the physical switch ports for LACP and the correct vlan port trunking

If you find this or any other answer useful please consider awarding points by marking the answer correct or helpful
0 Kudos
a_p_
Leadership
Leadership

At http://kb.vmware.com/kb/1004048 you will find an example of how this needs to be configured as well as what's supported and what isn't.

André

0 Kudos
titaniumlegs
Enthusiast
Enthusiast

ESX vSwitches don't actually support LACP.  They support link aggregation, but not the automatic configuration part of LACP.  The KB a.p. referred to spells out the Cisco config

interface GigabitEthernet1/1
<snip>

channel-group 1 mode on

The "on", means specifically EtherChannel without LACP or PAgP.

Here are the supported channel modes on Cisco, at least on IOS 12.2.

vtme-svl-4948(config-if)#channel-group 63 mode ?
  active     Enable LACP unconditionally
  auto       Enable PAgP only if a PAgP device is detected
  desirable  Enable PAgP unconditionally
  on         Enable Etherchannel only
  passive    Enable LACP only if a LACP device is detected

If you have mode active with ESX, you can get strange results.

Further, if you have two switches, you can only build a channel/link aggregation using ports on both switches if the switches support it and are configured (cross cables/ISLs, firmware and setup/config) for it.  Cisco calls this CrossStack EtherChannel (c3750), MEC (Multichassis EtherChannel - 6500 series) or VPC (Virtual Port Channels - Nexus).  Nortel has SMLT or MDLT.  There are others.

Share and enjoy! Peter If this helped you, please award points! Or beer. Or jump tickets.
SatishKumarSh
Contributor
Contributor

Dear All,

Thanks a ton for finding time and replying. I am not as expert on network and to help you narrow it down, i m posting the configuration related to our setup in core switch. Please let me know if any other config is required and i will provide same to close.

BTH-SF-N5k-01# sh int ethernet 101/1/1 switchport

Name: Ethernet101/1/1

  Switchport: Enabled

  Switchport Monitor: Not enabled

  Operational Mode: trunk

  Access Mode VLAN: 1 (default)

  Trunking Native Mode VLAN: 1 (default)

  Trunking VLANs Enabled: 1-3967,4048-4093

  Administrative private-vlan primary host-association: none

  Administrative private-vlan secondary host-association: none

  Administrative private-vlan primary mapping: none

  Administrative private-vlan secondary mapping: none

  Administrative private-vlan trunk native VLAN: none

  Administrative private-vlan trunk encapsulation: dot1q

  Administrative private-vlan trunk normal VLANs: none

  Administrative private-vlan trunk private VLANs:

  Operational private-vlan: none

  Unknown unicast blocked: disabled

  Unknown multicast blocked: disabled

BTH-CR-6506-02#sh int tenGigabitEthernet 1/15 switchport

Name: Te1/15

Switchport: Enabled

Administrative Mode: trunk

Operational Mode: trunk (member of bundle Po4)

Administrative Trunking Encapsulation: dot1q

Operational Trunking Encapsulation: dot1q

Negotiation of Trunking: On

Access Mode VLAN: 1 (default)

Trunking Native Mode VLAN: 1 (default)

Administrative Native VLAN tagging: enabled

Operational Native VLAN tagging: disabled

Voice VLAN: none

Administrative private-vlan host-association: none

Administrative private-vlan mapping: none

Operational private-vlan: none

Trunking VLANs Enabled: ALL

Pruning VLANs Enabled: 2-1001

Capture Mode Disabled

Capture VLANs Allowed: ALL

Unknown unicast blocked: disabled

Unknown multicast blocked: disabled

We have tried so many options but nothing seems to work, so please help.

Thanks again...

0 Kudos
titaniumlegs
Enthusiast
Enthusiast

So, you have a c7000 blade chassis with a pair of GBE2C.  The GBE2C need to be configured for link aggregation across both of them for the ports the ESX blades use.  I'm not sure how possible this is, since I don't have one handy, but their doc (http://h20000.www2.hp.com/bc/docs/support/SupportManual/c00701973/c00701973.pdf) only shows the external ports being aggregated, and not ports on both switches.  The attached diagram is an attempt to illustrate this.  I did 4 ports based on the HP doc, but 6 ports, if supported, should work the same way.  You need to talk to HP about making this work, specifically creating a trunk/aggregation with internal ports on two switches.

(Doesn't help that when Cisco says "trunking" they mean 802.1Q VLAN tagging, but when HP  says "trunking" they mean link aggregation.)

If you create a vSwitch set to IP hash with links on both GBE2C switches, and they don't do any kind of cross switch link aggregation, I would fully expect lots of packet loss.

If HP can't do trunks in the fashion I describe, then you'll have to use MAC or port based "routing" on the vSwitch.  If you use a dvSwitch (distributed vSwitch), VMware has added a load-based option, where virtual ports of VMs and the VMKernel are assigned to a physical NIC based on the load of the NICs.

Share and enjoy! Peter If this helped you, please award points! Or beer. Or jump tickets.
titaniumlegs
Enthusiast
Enthusiast

One more thing. 

IP hash doesn't truly balance.  Traffic from one IP address (VM or VMK) to another IP address will always resolve to the same uplink NIC, until the link fails or the vSwitch gains or loses a link.  The formula is

SRC_IP XOR DST_IP MOD #Uplinks

If you have the following:

192.168.0.101 VM1

192.168.0.201 VMkernel

192.168.0.61 a storage device they both use

VM1 will use

101 ->  01100101

61 ->  00111101  XOR

        --------

88 <-  01011000

88 MOD 6 = 4

VMkernel will use

102 ->  01100101

201 ->  00111101  XOR

        --------

175 <-  10101111

175 MOD 6 = 1

So, with lots of sources and/or destinations, you will get traffic on all links, but it won't be balanced, and no IP to IP connection gets the benefit of more than one uplink.

Share and enjoy! Peter If this helped you, please award points! Or beer. Or jump tickets.
0 Kudos
SatishKumarSh
Contributor
Contributor

HI,

Thanks so much for your suggetions. As per your input, it seems IP Hash is not the right choice here. What should we use then? Our basic need is that our traffic from all the VMs should load balance across 6 NIC per physical hosts we have. And it should balance the same across each GBE2C blade ethernet switch. Another query is how people best do it in blade server world as all blade server enviroement have atleast 2 switch in chassis to provide redundancy.

Meanwhile, we tried creating 2 vSwitch and giving it 3 NIC connected with same ethernet switch (GBE2C) but still packet loss was reported using IP HASH policy.

Really appreciate your efforts, Thanks a ton for your advice...

0 Kudos
SatishKumarSh
Contributor
Contributor

Dear All,

Do you have anything to add on VMware load balancing as per our requirement. Which one to use?

0 Kudos
SatishKumarSh
Contributor
Contributor

Hi Guys,

Thanx, we conclude it by saying that IP Hash cant be done across physical switches..

Thank you all for your support..

regs,

0 Kudos