Nic teaming and load balancing policy and routing ...

fgl · ‎06-17-2014

I am encountering a odd behavior in which I have 4 pnics split into 2 standard vSwitch with 2 vmnics each.

vSwitch0 > vmnic0, vmnic1 > management network vmkernel and VM port groups

vSwitch1 > vmnic2, vmnic3 > iSCSI vmkernel and vmotion vmkernel

LB policy is "Route based on the originating virtual port ID"

All vmnics are active/active in each vSwitch.

My network guys are telling me that according to their packet captures it shows traffic coming in on vmnic1 on vSwitch0 but the outbound traffic for the same VM is sending it out on vmnic3 which is on vSwitch1. I don't understand how inbound traffic can come in one vSwitch and out on a completely different vSwitch. Shouldn't both inbound and outbound traffic for the same VM be on the same vSwitch and the same vmnic as the originating virtual port ID?

BTW this setup has been working fine for me until the network people switch to a new Juniper network. They are saying it has something to do with timeouts and they are overriding the timeout parameter for my VMware systems as a temporary fix.

Anyone encounter this before or have any suggestions as I am at a lost here.

JPM300 · ‎06-17-2014

Hey fgl,

Are you running NAS as your primary datastore for your VM's if so this could be what they are seeing. NAS doesn't bind to a VMK so it picks the lowest number as its first target. If you didn't separate your traffic with different subnets/VLANS there could be cross contamination.

Also when traffic comes into vSwitch0 there is no way for it to get to vSwitch 1 without going back out the vmnics assigned to vSwitch0. So if you want to send traffic from vSwitch0 to vSwitch1, traffic will go out vSwitch0 on either vmnic0 or vminc1 then hit your psychical switch, then go back into vSwitch2 under vmnic2 or vmnic3.

Would you be able to provide us a screenshot of your networking, just blurr out the names ect ect.

fgl · ‎06-17-2014

Hi jpm300,

No I am not running a NAS, all my datastores are either iSCSI or FC attached. I forgot to mention that all of my ports are vlan trunked so each vSwitch has roughly 20 vlans, the management network, iSCSI, and vmotion are all on separate vlans. That's the weird part about traffic coming in vSwitch0 and out through vSwitch1 according to the network guys. As you mention this is not possible and I am puzzled. I also don't understand how does increasing the timeout on their Juniper firewall eliminate the problem. I say it's something wrong with their Juniper firewall equipment but network guys are always right and everyone else is wrong, every network people I've worked with is the same.

This is what they claim is happening.

1) vcenter (10.25.5.8) sends packets to esxi host (10.3.2.10) and host receives it on vswitch0 vmnic1. vswitch0 is where the management network (10.3.2.10) is and also where other VMs across different vlans are.

2) esxi host (10.3.2.10) sends packets back out to vcenter (10.25.5.8) on vmnic3 which is on vswitch1. vmnic3 happens to be the iSCSI vmkernel and is on a different vlan (10.5.8.8).

So I don't understand how it's even possible for vcenter to communicate inbound on the management network and esxi host send replies out through the iSCSI vmnic.

Do you think changing the load balance policy to "Route based on source MAC hash" would help?

JPM300 · ‎06-17-2014

No, I don't think that would make a difference. Are you using heartbeat datastores for you HA in your cluster. Some communication goes across the ISCSI network there for the heartbeat datastore check. Could this be the traffic they are seeing???

fgl · ‎06-18-2014

Yes I am using HA heartbeat datastores. What the network guys are saying is that they are overriding the firewall rule so that traffic are permitted without needing to hear the return traffic. What they are saying is that their network/firewall is not seeing the return traffic coming back out the same vnic it when in to. They say this is also happening on not just vcenter traffic but on any VMs like Data Recovery, Data Protection, etc... that are communication with the ESXi host. The VMs themselves on all the hosts are fine because they are servers and don't communication with the ESXi host for anything so this is why they are pointing the finger at the ESXi host configuration but my setup is very basic VMware configuration.

My management network vmkernel has the "management traffic" checkbox enabled. My vmotion vmkernel has the "vMotion" checkbox enabled. But my iSCSI vmkernel has none of the checkboxes enabled and I don't have iSCSI port binding enable because I have both the vmotion and iscsi vmkernels on the same vswitch with 2 vmnics for redundancy.

Shouldn't all management traffic be only going through the vmkernel with only the "management traffic" checkbox enabled?

JPM300 · ‎06-18-2014

It should. You could try moving your vMotion kernel over to your other nics as well. Typically I will group the vMotion port group with my management nic as management doesn't require a lot of network traffic. I try and keep the vMotion traffic off my ISCSI network as iSCSI is extreamly chatty and vMotion is very bursty. If you put a host into maintence mode and it vMotions 20 VM's off it, your going to put a spike on your iSCSI network / SAN which will be sharing traffic. This can sometimes be problimatic. Anyhow something you can try and see if it changes anything. Aside from that I can't see why your traffic is routing oddly. You could also check your routes

https://pubs.vmware.com/vsphere-51/index.jsp?topic=%2Fcom.vmware.vcli.ref.doc%2Fvicfg-route.html

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=200142...

All

Nic teaming and load balancing policy and routing question