ESX, NFS and NIC teaming

djaquays · ‎04-03-2009

Hello,

I'm sure these questions have been answered previously to some degree before. I've found partial answers in my searches, but I didn't want to ressurect several threads to try and get further clarification.

We are in the process of upgrading from an EMC fibre channel connected SAN to a NetApp NFS connected NAS as our main storage device. We have 5 ESX hosts hosting roughly 40 guests. Our current setup is redundant 1Gb Fibre for storage, 2Gb Cu NICs teamed for Service Console and VMKernel traffic and 2Gb Cu NICs teamed for VMachine traffic.

Now for the new setup: All hosts and the NAS will be plugging directly into our Extreme BD8808. My understanding is that all Host level NFS traffic flows through the VMKernel path, is that correct? At this point, our VMachine traffic "never" exceeds the throughput of a single Gb link. It seems that our available bandwidth would be much better utilized, without loss of redundancy, by teaming all 4 NICs and using VLANs to separate the VMKernel, Service console and VM traffic.

AWo · ‎04-03-2009

Welcome to the forums!

I wouldn't team all four NIC's in one vSwitch.

The configuration which comes to my mind is:

2 x vSwitches wit 2 NIC's each

1 for Console and NFS (or only NFS) and 1 for guests (and Console)

As far as I know you should separate traffic on the vSwitch level if possible.

If you found this information useful, please consider awarding points for "Correct" or "Helpful" replies. Thanks!!

AWo

VCP / vEXPERT 2009

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =

djaquays · ‎04-03-2009

I have also read that they should be separated at the vswitch level. I haven't really seen a why though, and I'm trying to understand the benefits there are to doing that versus separating via vlans.

djaquays · ‎04-06-2009

I'm assuming people do not typically hunt back to page 4 for threads to read.

AWo · ‎04-06-2009

I'm assuming people do not typically hunt back to page 4 for threads to read.

You're statement might be wrong.

This might be more applicable: people may have a job, a familiy, a house, holidays where they leave the console, a time to sleep and some more threads to work on where people might need a solution very urgent...and all the time they spend here and all the help they give is not payed.

So you may think about being a little more patient in the future.

From the VMware Performance manual:

Use separate vSwitches (and consequently separate physical network adapters) to

avoid contention between service console, VMkernel, and virtual machines, and

between virtual machines running heavy networking workloads.

You'll find the word "contention" in this statement. That means a virtual switch acts like a physical one. The more systems are attached the more bandwith is needed inside the switch to avoid contention. Because if server A communicates to server B, server C has to wait to communicate to A or B.

By using two switches you lower the load and if server A has two network port connected to two switches, server C can communicate to server A while A talks to B at the same time.

IMHO by using two virtual switches you give ESX the chance to hanlde the work in a (more) parallel manner. Please consider that as my interpretion of this, as I do not have more technical details about that at hand.

If you found this information useful, please consider awarding points for "Correct" or "Helpful" replies. Thanks!!

AWo

VCP / vEXPERT 2009

vExpert 2009/10/11 [:o]===[o:] [: ]o=o[ :] = Save forests! rent firewood! =

djaquays · ‎04-06-2009

I'm assuming people do not typically hunt back to page 4 for threads to read.

You're statement might be wrong.
This might be more applicable: people may have a job, a familiy, a house, holidays where they leave the console, a time to sleep and some more threads to work on where people might need a solution very urgent...and all the time they spend here and all the help they give is not payed.
So you may think about being a little more patient in the future.

My statement had nothing to do with patience. It had to do with the understanding that in a forum that receives hundreds of replies a day, most people are not going to spend their unpaid time between their job, famile, house, holidays, et al to read back 3 days. You may think about being a little less abrassive in the future.

From the VMware Performance manual:
Use separate vSwitches (and consequently separate physical network adapters) to
avoid contention between service console, VMkernel, and virtual machines, and
between virtual machines running heavy networking workloads.
You'll find the word "contention" in this statement. That means a virtual switch acts like a physical one. The more systems are attached the more bandwith is needed inside the switch to avoid contention. Because if server A communicates to server B, server C has to wait to communicate to A or B.
By using two switches you lower the load and if server A has two network port connected to two switches, server C can communicate to server A while A talks to B at the same time.
IMHO by using two virtual switches you give ESX the chance to hanlde the work in a (more) parallel manner. Please consider that as my interpretion of this, as I do not have more technical details about that at hand.
<br>If you found this information useful, please consider awarding points for "Correct" or "Helpful" replies. Thanks!!<br>
<br>
AWo
VCP / vEXPERT 2009

This makes a little more sense. I'm not sure how much it would apply to our environment, but it at least gives us a direction to research more before doing anything in production. Thanks.

kjb007 · ‎04-06-2009

Multiple virtual switches add increased level of separation as well. The vSwitches are basically objects in memory, therefor, portgroups on a vSwitch are part of that same memory object. Separate switches provide additional security in that they are separate objects in server memory. Provided you are using VLAN tags, the network traffic should be logically separated, but dividing virtual machine traffic from management traffic would be a better architecture. If you have the available pNIC, you could take this a step further and separate your storage traffic from your management traffic as well with a separate switch, but to be fully redundant, you would need 2 additional pNICs to do so.

At the very least, have two vSwitch, one for vm, and one for management and storage. When creating your portgroups, don't use both pNICs as active for both portgroups. Use active/standby, standby/active to make sure you physically separate traffic as well within that vSwitch.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

djaquays · ‎04-06-2009

At the very least, have two vSwitch, one for vm, and one for management and storage. When creating your portgroups, don't use both pNICs as active for both portgroups. Use active/standby, standby/active to make sure you physically separate traffic as well within that vSwitch.
-KjB
VMware vExpert

This is where I find the 2 NIC/vswitch setup confusing. Unless I'm misunderstanding how things work when done the way you describe, you're suggesting that the storage network only be given a single active gbit link per host while the VM traffic is given 2 active gbit links. Currently we do not have a single ESX Host that exceeds 30MB/s in VM Traffic. I realize the calculations are probably a little fuzzy and impossible to get a perfect conversion because of protocol differences, but it's not unheard of for a single machine (Application DB server that's currently physical being replaced by a similar product that will be virtual) to hit 50,000 FC Frames/s on the storage links. The conversion is in the neighborhood of 97MB/s, or pushing the limit of a single gbit link for just that server.

So, assuming that the 2 NICs per vSwitch is the only way to go, would it not make more sense to give the Storage network 2 active NICs and then put the SC and the VM traffic on a second vswitch using VLANs and the active/standby standby/active scheme? Is there any concern with that?

kjb007 · ‎04-06-2009

The problem you'll run into is the way that the frames are load balanced/distributed over multiple pNICs. In order to distribute load over more than one pNIC, you have to have more than one src-dst combination. Otherwise, only one pNIC will ever be used. You can work around this by having multiple IPs on your NFS server, and mounting the exports over different IPs. Without this, your vmkernel IP and NFS server IP make one connection, so there's no way to split the I/O that occurs for storage. If you add a 2nd IP on your NFS server, then you are providing another src-dst combination, and are able to use another pNIC.

Ideally, you would split your management traffic, your storage traffic, and your vm traffic all over separate vSwitches. Use 2 pNICs for redundancy, and I agree that most likely, you will not need that much bandwidth, but you use two for redundancy. Since you only have 4 pNIC here, you'll have to make some concessions, and splitting your management and storage to vSwitch makes the most sense.

You would keep 2 for the vm traffic, because your vm's each have their own IP's, and thereby each can use a separate pNIC, so load can more effectively be balanced here, as opposed to the management/storage scenario above.

Hope that makes sense.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

djaquays · ‎04-06-2009

The problem you'll run into is the way that the frames are load balanced/distributed over multiple pNICs. In order to distribute load over more than one pNIC, you have to have more than one src-dst combination. Otherwise, only one pNIC will ever be used. You can work around this by having multiple IPs on your NFS server, and mounting the exports over different IPs. Without this, your vmkernel IP and NFS server IP make one connection, so there's no way to split the I/O that occurs for storage. If you add a 2nd IP on your NFS server, then you are providing another src-dst combination, and are able to use another pNIC.

The NetApp (our NFs server is a FAS2050 with 2 filers, 4*1Gb Cu ports per filer) and VI3 Best Practices paper from NetApp suggests, when you have cross-module/switch Etherchanneling (802.3ad) that you use multimode vifs and multiple datastores per filer. With 2 data stores per filer, that would give us 4 different src-dst options per ESX host for load balancing across pNICs.

The vast majority of the time a single gigabit NIC per ESX host can handle our storage traffic without issue. The concern is having to explain at any point why our main application, that is pretty much the core of our business, is suddenly taking twice as long to do anything with "upgraded" storage infrastructure.

I'm really not trying to be a pain here. I just want to make sure that when we make our move from our current SAN to our new NAS that we do everything possible to make sure that we end up with the best performance and reliability possible. Because of that, I want to make sure that I truly understand the pros and cons to the options available before making any decisions.

Ideally, you would split your management traffic, your storage traffic, and your vm traffic all over separate vSwitches. Use 2 pNICs for redundancy, and I agree that most likely, you will not need that much bandwidth, but you use two for redundancy. Since you only have 4 pNIC here, you'll have to make some concessions, and splitting your management and storage to vSwitch makes the most sense.

You would keep 2 for the vm traffic, because your vm's each have their own IP's, and thereby each can use a separate pNIC, so load can more effectively be balanced here, as opposed to the management/storage scenario above.

Hope that makes sense.
-KjB
VMware vExpert

>

kjb007 · ‎04-06-2009

If you have multiple IPs, then the first part is done. The 2nd part is etherchannel, which you also have. The third part would be the load balancing algorithm, so make sure you configure for ip hash to take full advantage of the etherchannel.

Deciding which portgroups have which pNICs attached is really up to you. If you have the IPs and can take advantage of the added pNICs, then you should definitely spread that load over the hardware that you have. You would still want to segregate the type of traffic, and make sure to isolate the vm's from the other type of traffic, as a best practice. Now best practice is not always the best way to go in every situation, it is basically a good starting point. If you want to use your resources differently for the reasons you're outlining, then you have a valid reason to not go with the "best practice" as it is not the "best" use of the resources for you.

Since you can utilize both pNICs, then you don't have to use the active/standby method for your situation. Since management traffic is fairly light, except during vmotions, you could pair your vm traffic with your vmkernel portgroup, although for security purposes, this is usually not good to do. On the other hand, if you're using VLANs, then your traffic is at least logically separated.

In my dev environment, I have blades, which only have two pNICs, so all my traffic goes over one vSwitch, with all pNICs active. You could use a similar approach, and add all 4 pNICs into 1 vSwitch. But, troubleshooting gets very difficult when your traffic can ride over any of the 4 pNICs. If/when you run into connectivity problems, it gets that much harder to troubleshoot with 4 pNICs, vs 2. This is one big reason why I personally never add more than 2 pNICs to 1 vSwitch. If you have 4 src-dst combinations, then you can use 4 different pNICs for your storage traffic, but now you'll be splitting that bandwidth with the vm's. If for some reason one or more vm's start misbehaving, then you are risking the storage platform for all of your vm's, or at least 25% of your vm's. This is another reason why it's best to keep your storage traffic separate and isolated from the vm's.

Ultimately, you have to judge what works best for you and what the best approach for your situation is.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

djaquays · ‎04-06-2009

Thanks, KjB. As I said before, I just really want to make sure that I have as many answers as I can to the different paths so we can decide what the best solution for our environment is going to be. I appreciate your time entertaining my questions.

kjb007 · ‎04-06-2009

No problem. Glad to help answer those questions.

-KjB

VMware vExpert

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

Itzikr · ‎06-12-2009

a

Itzik Reich