sofakng
Contributor
Contributor

How does my network configuration look? (i.e. do I understand ESXi networking?)

I have a very small, single server, ESXi 5.1 host and I'm wondering if I have everything correctly configured...

Here's what I have:

vSwitch 0 - virtual machine network

  • Two physical NICs
    • NIC Teaming configured for Route based on IP hash
      • LAG configured on physical switch (Dell PowerConnect 2824)

QUESTION: What's the difference between using NIC teaming (LAG group) and just assigning the two physical NICs to my vSwitch and NOT setting up the teaming/LAG group?

vSwitch 1 - management network

  • One physical NIC
  • Contains VMkernel Port for management traffic

My goal is to have 2x NICs (teamed) for all virtual machine traffic since that's 99% of the traffic, and then use my "extra" NIC for management traffic (i.e. vSphere Client access).

The only "strange" thing I'm doing is that I'm using an "all in one" ESXi box where the ESXi host is also my NFS file server.  More specifically, I have RAID cards passed through to a VM.  This VM hosts NFS storage where my other VMs are stored.  Therefore, I must power up my file server VM and once it is powered on, I can then access my other VMs hosted on it's NFS storage.  Does that make sense?

0 Kudos
12 Replies
rickardnobel
Champion
Champion

sofakng wrote:

QUESTION: What's the difference between using NIC teaming (LAG group) and just assigning the two physical NICs to my vSwitch and NOT setting up the teaming/LAG group?

The difference is that with IP Hash together with static LAG you could get a higher potential throughput for a single VM. With default NIC Teaming Policy Port ID you could never get more bandwidth to a single VM than one physical NIC could give you, but with IP Hash you could under some conditions use both your cards at the same time. This depends mainly how many clients are accessing this VM.

The only "strange" thing I'm doing is that I'm using an "all in one" ESXi box where the ESXi host is also my NFS file server.  More specifically, I have RAID cards passed through to a VM.  This VM hosts NFS storage where my other VMs are stored.  Therefore, I must power up my file server VM and once it is powered on, I can then access my other VMs hosted on it's NFS storage.  Does that make sense?

Do you have other clients accessing this NFS server as well, except VMs inside this ESXi?

If not, are there some very specific features of this NFS server that you need?

If not, it could be a lot of extra unneeded overhead in this setup, since you could otherwise just store the VMDK files on a local datastore and have the vmkernel manage all disk access. Now you probably need to go through your physical switch for each disk read/write (depends on how your NFS and VMkernel IP are setup.)

My VMware blog: www.rickardnobel.se
0 Kudos
sofakng
Contributor
Contributor

OK - So if I just assigned two NICs and used the default NIC teaming policy, I do NOT have to configure LAG on my switch, right?  ...but I would still have two NICs of throughput, but each VM would be limited to one NIC throughput.  However, ALL VMs total throughput would be two NICs?

This seems like an easier solution (because LAG isn't involved) but still provides fault tolerance and load balancing...?

Also, my NFS server (hosted INSIDE the ESXi host) also provides CIFS/NFS shares for the rest of my network.

0 Kudos
rickardnobel
Champion
Champion

sofakng wrote:

OK - So if I just assigned two NICs and used the default NIC teaming policy, I do NOT have to configure LAG on my switch, right? 

That is correct.

sofakng wrote:

...but I would still have two NICs of throughput, but each VM would be limited to one NIC throughput.  However, ALL VMs total throughput would be two NICs?

Yes, each VM will be internally "assigned" to one outgoing NIC at startup, so the combined bandwidth of the VMs will be spread over the available physical NIC ports.

This seems like an easier solution (because LAG isn't involved) but still provides fault tolerance and load balancing...?

In many cases it is an easier setup with less configuration and it does give you fault tolerance and also some kind of load balancing.

Also, my NFS server (hosted INSIDE the ESXi host) also provides CIFS/NFS shares for the rest of my network.

Ok, that makes sense for using the NFS server as a virtual machine for external clients. However, I would still say it could give you better performance to put the other VMs as ordinary VMDK files directly on the local storage.

My VMware blog: www.rickardnobel.se
0 Kudos
sofakng
Contributor
Contributor

OK - I've setup my LAG configuration but I'm not getting full performance.  (using iperf as a test)

Using iperf, when I test VM-to-VM, I'm getting 5.0+ Gbps which is correct.

Using iperf, when I test VM-to-ONE outside network PC I'm getting 1.0 Gbps, which is also correct.

However, when I test VM-to-TWO outside network PCs (i.e. running two iperf clients simultaneously from outside PCs), I'm only getting 1.0 Gbps combined.  (500 Mbps from each PC)

Because of the LAG group (2x 1.0 Gb), shouldn't each of the outside PCs be able to get 1.0 Gb/s for a combined 2.0 Gb/s?

EDIT: My server VM (inside ESXi) is using a single vmxnet3 NIC which is 10 Gb/s and because I'm setting up IP Hash load balancing, I don't need two NICs inside the VM, right?

EDIT2:  I've tested both inbound and outbound from server-to-client and both have the same result.  1.0 Gbps combined bandwidth instead of 2.0 Gbps...

EDIT3:  esxtop confirms that only one NIC is being used even with simultaneous different IP connections.

0 Kudos
rickardnobel
Champion
Champion

sofakng wrote:

Because of the LAG group (2x 1.0 Gb), shouldn't each of the outside PCs be able to get 1.0 Gb/s for a combined 2.0 Gb/s?

The vmkernel algorithm that selects which vmnic (physical NIC port) to place outgoing traffic is not based on actual load, but on a strict calculation involving, amother other things, the last octet of IP addresses of the sender and destination.

This means that based on which actual IP addresses you have on your external clients different NICs is used. For now it seems like your two external PCs by chance has not real suitable addresses for this. If possible, increase one address by 1 and try again.

In practice this is of course very difficult, but the idea is that if you have some number of clients they will randomly spread over the vmnics and you should not have to care for the exact adresses used.

EDIT: My server VM (inside ESXi) is using a single vmxnet3 NIC which is 10 Gb/s and because I'm setting up IP Hash load balancing, I don't need two NICs inside the VM, right?

You will not have to add another interface on the VMs. Also this is ment to make the VMs more simpler to manage that to have multiple virtual NICs.

My VMware blog: www.rickardnobel.se
0 Kudos
sofakng
Contributor
Contributor

You're right!

I've changed the IP address and now it's load balancing across both NICs (both incoming and outgoing traffic are getting 2.0 Gbps total).

Are there any other options for load balancing to get around this problem?

Would OS-level (i.e. two NIC adapters per VM) get around it?  (but I understand this adds complexity and I think removes the need for vSwitch IP hash load balancing)

0 Kudos
rickardnobel
Champion
Champion

sofakng wrote:

You're right!

I've changed the IP address and now it's load balancing across both NICs (both incoming and outgoing traffic are getting 2.0 Gbps total).

Nice that you got it to work! At least that is a "proof of concept" that the LAG - IP Hash works. Do you have many clients? You should be able to get a decent spread if having some number of addresses on your external network.

However, you have now seen quite clearly both the disadvantage as well as the advantage of "IP Hash", i.e. with only one client the bandwith only one NIC, with several clients you often, by random, get a good spread, but also some more configuration to be done.

Would OS-level (i.e. two NIC adapters per VM) get around it?  (but I understand this adds complexity and I think removes the need for vSwitch IP hash load balancing)

You could in theory do it on OS-level, but it would still have to pass through the physical NICs, which means you would have to create different portgroups and bind to different adapters, and then in some way distribute the load for the clients among these two vNICs. It might be possible to do, but does also increase complexity ever more.

My VMware blog: www.rickardnobel.se
0 Kudos
Gkeerthy
Expert
Expert

sofakng wrote:

Are there any other options for load balancing to get around this problem?

inorder to solve this problem vmware developed LBT (load based teaming) but this is available in the vDS only. in LBT based on pNIC load the VM traffic will automatically moved to another pNIC. Here no need to worry of LAG/network load etc.....

if it is web servers... and backup/monitoring servers we can give more IP and evenly spread the traffic of the vm, else it will add more complex to the system

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)
0 Kudos
rickardnobel
Champion
Champion

Gopinath Keerthyrajan wrote:

in LBT based on pNIC load the VM traffic will automatically moved to another pNIC. Here no need to worry of LAG/network load etc.....

Load Based Teaming is a very good option, however it is as you say only available on Enterprise Plus licencing and it will also never give a single VM more bandwidth than one physical NIC. If the goal is to really maximize the bandwidth for single VMs then IP Hash has to be used.

My VMware blog: www.rickardnobel.se
0 Kudos
sofakng
Contributor
Contributor

Rickard Nobel wrote:

You could in theory do it on OS-level, but it would still have to pass through the physical NICs, which means you would have to create different portgroups and bind to different adapters, and then in some way distribute the load for the clients among these two vNICs. It might be possible to do, but does also increase complexity ever more.


If I removed IP Hash and had one vSwitch with two NICs, and then assigned to virtual NICs to a VM, would they each get "assigned" a physical NIC or could both virtual NICs attempt to route out the same physical NIC?

Rickard Nobel wrote:

Gopinath Keerthyrajan wrote:

in LBT based on pNIC load the VM traffic will automatically moved to another pNIC. Here no need to worry of LAG/network load etc.....

Load Based Teaming is a very good option, however it is as you say only available on Enterprise Plus licencing and it will also never give a single VM more bandwidth than one physical NIC. If the goal is to really maximize the bandwidth for single VMs then IP Hash has to be used.


OK - I was going to get an evaluation of Enterprise Plus but it won't give a VM more than one NIC of bandwidth then it won't work for me.

My goal is fairly simple though... I have a file server VM and I want my clients (non-VM, remote network machines) to have as much throughput as possible.  For example, if two client machines are copying files I want them each to get full gigabit speed from the server...

0 Kudos
Gkeerthy
Expert
Expert

sofakng wrote:



If I removed IP Hash and had one vSwitch with two NICs, and then assigned to virtual NICs to a VM, would they each get "assigned" a physical NIC or could both virtual NICs attempt to route out the same physical NIC?

Again if 2 vNICS are there then it will pass through 2 pNICS, but if there are other vms using those pnics then congestion will happen, and if you use ip hash or port id policy this is the case. That is why LBT comes in to picture, this will ensure that a vm traffic can utilize a maximum pnic bandwith, if other vm traffic comes, it will transfer to remaining pnics thus the original vm traffic wont get affected. that is it will avoid congestion.


Load Based Teaming is a very good option, however it is as you say only available on Enterprise Plus licencing and it will also never give a single VM more bandwidth than one physical NIC. If the goal is to really maximize the bandwidth for single VMs then IP Hash has to be used.

OK - I was going to get an evaluation of Enterprise Plus but it won't give a VM more than one NIC of bandwidth then it won't work for me.

LBT only ensure, that a vm will get maximum bandwidth of a pnic. That is why it is widely used, in the high network intensive application. As per the vmware architecture, once a network session is established, between a vm and the out side network, it will only flow through one vnic, one pnic and one pswitch, other teaming policy is not aware of the pnic congestion happening inside the pnic. So the inented vm will get a reduced bandwidth. This only applies if many vms share a pnic.

So in short, LBT is best in a multiple vm environment, and ip hash will be the best if only one vm shares 2 pnics. in the nfs, iscsi cases we can use this.

My goal is fairly simple though... I have a file server VM and I want my clients (non-VM, remote network machines) to have as much throughput as possible.  For example, if two client machines are copying files I want them each to get full gigabit speed from the server...

again in real world, i never saw 24/7 time all the pnics are used 100 %, once again - if the file server vm is dedicated with a vswitch and 2 pnics, then it will be better to use ip hash, but as mentioned by Rickard - IP hash will give more bandwidth if the clients ip addresses will eventually create different hashes, and it will be difficult to manage..so again IP hash will eventually give less bandwidth. if you ensure all the clients IP give o and 1 hashes it will be good.

So i really recommend to use LBT in this case.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)
0 Kudos
Gkeerthy
Expert
Expert

for further reference see the below

http://frankdenneman.nl/networking/ip-hash-versus-lbt/

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)
0 Kudos