Dear All
We currently have a pretty standard ESX setup comprising of the following:
ESX 3.5U3 running on HP BL680 blades within a C7000 Virtual Connect chassis.
We have several Virtual Machines with MS Office Sharepoint installed.
We are attempting to load-balance traffic across two of the servers using Windows NLB.
It seems that load balancing works for around 20 mins then fails.
Our networking setup is as follows: Promiscuous Mode: Reject. MAC Address Changes: Accept. Forged Transmits: Accept. Load Balancing: Route Based on IP Hash. Network Failover Detection: Link Status Only. Notify Switches: Yes. Failback: Yes.
We have reproduced the exact same Application setup with physical servers in the same environment and all is OK (with same CISCO 6509 switches etc)...
Does anyone have any ideas around the ESX networking / CISCO setup that may help us?
Any help much appreciated.
Glad to help. Have seen many hiccups getting this sorted in the past (hence i have my notes!!)
P.S Dont see any correct points
Hi Bob,
Are you using Unicast or Multicast NLB? Here are my notes on MS NLB - hope they're are of use:
Unicast mode - not recommended for use in VMs
Supported in ESX but requires you set "Notify Switch" to NO on required vSwitch/port group...which basically breaks VMotion for any VM on that vswitch/port group as ARP updates are not sent out when the VM moves hosts.
Multicast mode - VMware best practice to use this.
Supported in ESX but many physical switches don't support it (apparently in multicast mode, an ARP response includes a unicast IP address format with a multicast MAC format, which is rejected by many switches) ...in which case a static ARP entry will be required on the p switch for the NLB multicast MAC and Cluster Virtual IP.... This is the same for both physical and virtual machines.
Here is the VMware KB article - clearly recommending you use multicast:
Here is a MS article about common NLB issues and how to fix them:
Hello,
Moved to Virtual Machine and Guest OS forum.
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs: http://www.astroarch.com/wiki/index.php/Blog_Roll
Top Virtualization Security Links: http://www.astroarch.com/wiki/index.php/Top_Virtualization_Security_Links
Hi Emmar
I THOUGHT we were already using Multicast NLB but after looking at the vmkernel logs it is showing as unicast...oops. Also, in the VMnet vswitch properties, I have "Notify Switches" ticked - and according to your doc this is not advisable.
We are about to re-test the NLB and will let you know the result.
Thanks again.
Bear in mind that if you stick with Unicast mode and therefore have to switch off "Notify Switch" you'll have issues with VMotions as once the VM moves from one host to the other the vSwitch will not Notify the Switch that the VM has moved and therefore traffic will not automatically be sent to it on the new host.
Multicast is the really the way to go but it does generally involve making change in your pSwitch environment.
Bob, bear in mind if your load balancing policy is IP hash you should have etherchannel configured on the Cisco switches, if ether channel not configured then change load balance policy to Port ID or MAC
Many thanks happyhammer - indeed we have ethercannel enabled on the CISCOs.
Many thanks.
Hi,
If you use Port ID then Windows NLB will not be of any use because VMware assigns the port in a static manner to only one vmnic adaptor. It alternates VM port assignments across the vmnic team as the are initially connecting and they will remain on that assigned vmnic.
Do you really need NLB in a VM, I find it really does not improve the throughtput much over VMware and Cisco Etherchannel because even if you have it configured correctly the return path may pigeon hole to only one of the interfaces. Small gain lot's of complexity no good either way.
Thanks, but to make clear - the NLB is across TWO VMs which are MOSS web front-end servers so there is much to be gained by balancing web traffic between the two.
Thanks for everyone's replies on this. Have just got it fixed by adding the NLB team's MAC address to the ARP table on the two CISCO switches and enabling LACP and hence etherchanneling on that portion of the network. Balancing well between the two servers now...
Hi Emmar
The "Correct" button only appears next to your latest reply but I want to make it clear that the most crucial information which helped me get this fixed was as a result of following the recommendations in your initial reply.
Thanks again.
Glad to help. Have seen many hiccups getting this sorted in the past (hence i have my notes!!)
P.S Dont see any correct points