Solved: Re: NFS performance and fault tolerance

ITTech2002 · ‎02-11-2009

I would like your thoughts on NFS Performance and fault tolerance....

Goal: Provide the best in: active fault tolerance, through put, and load balancing using only NFS datastores.

Given: Each ESX Server has 2 1000 FULL NICs one going to each switch, 2 switch blades on Cisco Catalyst 6500, 4 connections per storage device going to 2 - FAS3100 Series NetApp

Here is what I have so far (below), am I missing any thing? :

1. Networking (6.8)
a. Impliment Cross Stack Etherchannel
b. Create Etherchannel IP load balancing policy
2. Storage
a. Configure NetApp Networking with Multimode VIFS
b. Assign Alliases for each Volume with a unique IP (6.7)???
3. Vmware
a. When adding storage, use a unique IP for each Datastore.
b. Change Vswitch Load-balancing policy set to "Route based on IP hash"

Referance:

NetApp and Vmware Virtual Infrasture 3 - Storage Best Practices | TR3428 | v 4.4

JeffDrury · ‎02-11-2009

Ok that sounds like stacked 3750's, you should still be able to do Etherchannel. If these switches are also supporting devices on the production network I would also recommend creating a seperate VLAN strictly for your storage if you have not already done so.

View solution in original post

kjb007 · ‎02-11-2009

If you are using a newer sup module that will allow you to "stack" two 6500's, then you're fine. Otherwise, you can't span your etherchannel across those two devices. No etherchannel, means you can not use Route based on IP hash.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

JeffDrury · ‎02-11-2009

Are you using two 6500 chassis? If so then KjB is correct, no etherchannel without stacking. If you are using a single 6500 chassis and connecting to multiple switch blades then etherchannel is possible, and route based on IP hash is good.

ITTech2002 · ‎02-11-2009

Correction, we are using 3750s connect to each other with 32 G/bit interconnect cable

ITTech2002 · ‎02-11-2009

Correction, we are using 3750s connect to each other with 32 G/bit interconnect cable

JeffDrury · ‎02-11-2009

Ok that sounds like stacked 3750's, you should still be able to do Etherchannel. If these switches are also supporting devices on the production network I would also recommend creating a seperate VLAN strictly for your storage if you have not already done so.

kjb007 · ‎02-11-2009

Then your initial steps look good.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

ITTech2002 · ‎02-20-2009

It looks like ESX does not support "Dynamic" Etherchannel.... Only Static... Correct me if I am wrong???

Know this, would it best to change the "Network Failover Detection" setting from "link Staus only" to "Beacon Probing"?

Your Thoughts???

kjb007 · ‎02-20-2009

Correct. Etherchannel is only static, meaning channel mode needs to be on in your config.

Failure detection is a different topic from etherchannel. If you have upstream switches where your VLAN actually traverses through, then beacon probing can help detect that type of failure. Otherwise, if you want to know when a simple disconnect / port failure occurs, link status will suffice.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

titaniumlegs · ‎02-21-2009

ESX only supports static (on), like kjb said.

NetApp supports LACP (dynamic) or static (multi). I find if you're configuring static for ESX it's just as easy while you're in there to create static for NetApp as well.

Also, if it's a cluster, which your original post kinda hints, make sure you have a separate channel for each controller (not one big channel for both), and the partner vif set on each controller.

I have this running (multiple ESX boxes, 2x 3750, FAS3070HA) right now. Quite a few NetApp customers with very similar configs.

Couple other tips:

The number of aliases/IPs on the storage network for each filer only needs to equal the number of ports on the ESX server. You're trying balance the traffic out of the ESX servers. With enough servers (#servers >= # interfaces on the filer) traffic on the filer side will end up more or less balanced.

When assigning IP addresses, keep in mind the IP hash load balancing algorithm (which is documented in the NetApp Data ONTAP networking guides, oh, and in VMWorld 2008 session TA2784):

IP1_last_octet XOR IP2_last_octet MOD #interfaces = interface to be selected (starting with 0)

What this works out to is that for the best chance of stuff being balanced, you want the IPs and aliases of the filer to be sequential, and the IPs of the vmkernels to be sequential. For example:

192.168.42.200 filer1-stg1

192.168.42.201 filer1-stg2

192.168.42.204 filer2-stg1

192.168.42.205 filer2-stg2

192.168.42.101 esx1-vmk

192.168.42.102 esx2-vmk

192.168.42.103 esx3-vmk

192.168.42.104 esx4-vmk

If the different groups above are not sequential, there's a good chance that the IP hash algorithm resolves to the same port, and stuff isn't balanced. It's not the only way to do it, but it's simple. I've attached a silly little perl script that illustrates this.

Last tip for now: Use CDP (the little thought bubble next to the interface in the VIC in Networking) to make sure ESX is plugged in to the switch ports you think it is.

Share and enjoy! Peter If this helped you, please award points! Or beer. Or jump tickets.

All

NFS performance and fault tolerance