I would like your thoughts on NFS Performance and fault tolerance....
Goal: Provide the best in: active fault tolerance, through put, and load balancing using only NFS datastores.
Given: Each ESX Server has 2 1000 FULL NICs one going to each switch, 2 switch blades on Cisco Catalyst 6500, 4 connections per storage device going to 2 - FAS3100 Series NetApp
Here is what I have so far (below), am I missing any thing? :
1. Networking (6.8)
a. Impliment Cross Stack Etherchannel
b. Create Etherchannel IP load balancing policy
2. Storage
a. Configure NetApp Networking with Multimode VIFS
b. Assign Alliases for each Volume with a unique IP (6.7)???
3. Vmware
a. When adding storage, use a unique IP for each Datastore.
b. Change Vswitch Load-balancing policy set to "Route based on IP hash"
Referance:
NetApp and Vmware Virtual Infrasture 3 - Storage Best Practices | TR3428 | v 4.4
Ok that sounds like stacked 3750's, you should still be able to do Etherchannel. If these switches are also supporting devices on the production network I would also recommend creating a seperate VLAN strictly for your storage if you have not already done so.
If you are using a newer sup module that will allow you to "stack" two 6500's, then you're fine. Otherwise, you can't span your etherchannel across those two devices. No etherchannel, means you can not use Route based on IP hash.
-KjB
Are you using two 6500 chassis? If so then KjB is correct, no etherchannel without stacking. If you are using a single 6500 chassis and connecting to multiple switch blades then etherchannel is possible, and route based on IP hash is good.
Correction, we are using 3750s connect to each other with 32 G/bit interconnect cable
Correction, we are using 3750s connect to each other with 32 G/bit interconnect cable
Ok that sounds like stacked 3750's, you should still be able to do Etherchannel. If these switches are also supporting devices on the production network I would also recommend creating a seperate VLAN strictly for your storage if you have not already done so.
Then your initial steps look good.
-KjB
It looks like ESX does not support "Dynamic" Etherchannel.... Only Static... Correct me if I am wrong???
Know this, would it best to change the "Network Failover Detection" setting from "link Staus only" to "Beacon Probing"?
Your Thoughts???
Correct. Etherchannel is only static, meaning channel mode needs to be on in your config.
Failure detection is a different topic from etherchannel. If you have upstream switches where your VLAN actually traverses through, then beacon probing can help detect that type of failure. Otherwise, if you want to know when a simple disconnect / port failure occurs, link status will suffice.
-KjB
ESX only supports static (on), like kjb said.
NetApp supports LACP (dynamic) or static (multi). I find if you're configuring static for ESX it's just as easy while you're in there to create static for NetApp as well.
Also, if it's a cluster, which your original post kinda hints, make sure you have a separate channel for each controller (not one big channel for both), and the partner vif set on each controller.
I have this running (multiple ESX boxes, 2x 3750, FAS3070HA) right now. Quite a few NetApp customers with very similar configs.
Couple other tips:
The number of aliases/IPs on the storage network for each filer only needs to equal the number of ports on the ESX server. You're trying balance the traffic out of the ESX servers. With enough servers (#servers >= # interfaces on the filer) traffic on the filer side will end up more or less balanced.
When assigning IP addresses, keep in mind the IP hash load balancing algorithm (which is documented in the NetApp Data ONTAP networking guides, oh, and in VMWorld 2008 session TA2784):
IP1_last_octet XOR IP2_last_octet MOD #interfaces = interface to be selected (starting with 0)
What this works out to is that for the best chance of stuff being balanced, you want the IPs and aliases of the filer to be sequential, and the IPs of the vmkernels to be sequential. For example:
192.168.42.200 filer1-stg1
192.168.42.201 filer1-stg2
192.168.42.204 filer2-stg1
192.168.42.205 filer2-stg2
192.168.42.101 esx1-vmk
192.168.42.102 esx2-vmk
192.168.42.103 esx3-vmk
192.168.42.104 esx4-vmk
If the different groups above are not sequential, there's a good chance that the IP hash algorithm resolves to the same port, and stuff isn't balanced. It's not the only way to do it, but it's simple. I've attached a silly little perl script that illustrates this.
Last tip for now: Use CDP (the little thought bubble next to the interface in the VIC in Networking) to make sure ESX is plugged in to the switch ports you think it is.