VMware Cloud Community
vmwareluverz
Contributor
Contributor
Jump to solution

NetApp & NFS in production?

i'm in planning stage for NetApp NFS as datastore for ESX 3.5 production servers and planning to host 300 virtual machines. i'm curious if anyone has experience with NetApp NFS setup with ESX 3.5 and how is performance comparing to iSCSI & FC? do you follow general best practices and tuning anyhow if so can you reference to links or discussion?

For NFS, what is the maximum virtual machines can be hosted on ESX 3.5 or at least a formula we can go with to maximize performance. do you recommend NFS at all?

Reply
0 Kudos
1 Solution

Accepted Solutions
kjb007
Immortal
Immortal
Jump to solution

You have to make sure you have redundant paths to your storage. The vSwitch you use to connect to your storage should have a team of NICs connected to it. While you can't load balance over the NICs directly since you're connecting to one host, you can still leverage active/standby NICs to provide hardware redundancy. That being said, make sure you have multiple controllers so a filer outage doesn't cause your environment to go down. Your physical NICs should connect to separate physical switches so no switch failure can take you down. The best practices for NFS are pretty much best practices all around.

Good luck,

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB

View solution in original post

Reply
0 Kudos
10 Replies
aldikan
Hot Shot
Hot Shot
Jump to solution

Good questions,

I am very interested if anybody can provide the answers, as we are planning to go this route in the future.

Thanks

Alex

Reply
0 Kudos
jeremypage
Enthusiast
Enthusiast
Jump to solution

According to the toasters list (toasters@mathworks.com) the highest recommended was 50 VM's per volume, I'm running in that range now with ~400 VMs on a Netapp 3070A cluster. To tell the truth since it's over NFS you don't have to worry about the LUN getting locked by a specific ESX host for IO so we actually saw a performance increase over our old set up (FC attached LUNs over 2 GB FC with ~8 machines per LUN).

NFS on Netapp is by far the best thing that's happened to our datacenter in years, I am so pleased with how well it's working.

i'm curious if anyone has experience with NetApp NFS setup with ESX 3.5

and how is performance comparing to iSCSI & FC?

Similar performance but at peak loads it's actually faster..but that's an apples to oranges comparison since the disk subsystem was different (DS4800 versus 3070A)

general best practices and tuning anyhow if so can you reference to

links or discussion?

Yes, Netapp has a couple of white papers, also talk to your PSE and he can help you out with settings.

For NFS, what is the maximum virtual machines can be hosted on ESX 3.5

or at least a formula we can go with to maximize performance. do you

recommend NFS at all?

Like I said above we are targeting 50/NFS export but really I don't think it matters since the locking is done per VM, not mount point.

A few things to look at:

How are you going to establish failover? Since NFS points at an IP/Hostname you can't have it fail over with out bringing down your boxes. We used Cisco 3750E switches stacked so if one of the switches fails we can still roll.

How are you going to set up your network? We have a dedicated 10g network to the switch (dual connected across switches via a VIF). Please note this will disable the TOE ability of your 10g NICs in the filers. We also enabled 7500 MTU size. This is not officially supported by VMware but has worked well for us.

azn2kew
Champion
Champion
Jump to solution

Read these blogs helps you decide and understands precisely with NFS over NetApp solutions.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA
vmwareluverz
Contributor
Contributor
Jump to solution

jeremy,

so how do you implement failover solution for NFS? anyone has better solution to NFS failover design as well.

Reply
0 Kudos
kjb007
Immortal
Immortal
Jump to solution

You have to make sure you have redundant paths to your storage. The vSwitch you use to connect to your storage should have a team of NICs connected to it. While you can't load balance over the NICs directly since you're connecting to one host, you can still leverage active/standby NICs to provide hardware redundancy. That being said, make sure you have multiple controllers so a filer outage doesn't cause your environment to go down. Your physical NICs should connect to separate physical switches so no switch failure can take you down. The best practices for NFS are pretty much best practices all around.

Good luck,

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
jeremypage
Enthusiast
Enthusiast
Jump to solution

You can have decent load balancing too. Our current set up is dual 1 gig nics in each ESX host dedicated to NFS in an etherchannel to the switch and then 10gb from the switch to the filer. They load balance quite well, I am pretty sure that you get an NFS connection per VM, although I haven't looked into it because just looking at my switch statitics it's about 40/60 packets on port 0/1, which is good enough for me. This way you can have a port from your VIF on each switch so even a switch failure will not take you down.

It would be nice if Netapp would make it so the TOE functionality worked on a 10gb VIF and it would also rock if there was support for LCAP all the way through, although that might be a Cisco limitation with the stacked 3750Es and not the filer, it's relatively new on both and I don't remember which didnt support it off the top of my head. Still Etherchannel works fine, we've got ~70 VMs on each host sharing those two 1gb connections, although to be fair most of those boxes are running Java/APache so are more memory pigs then anything else.

Anyhow, I am very happy with it, it just plain works and frankly it's FAR easier to support and troubleshoot then your run of the mill mid level SAN implementation, not to mention we've got a few TB of space for user directories too, which is both faster then VMs and saves us a few Windows licenses.

Reply
0 Kudos
kjb007
Immortal
Immortal
Jump to solution

Without having the matching algorithm, the NFS traffic from ESX to the NetApp will not span multiple pNICs. If you have a means to have multiple IP addresses on the NFS server, and mount exports via the distinct IP addresses, you will get balanced traffic. Without a way to separate source and destination, there is no way to split the traffic. That is the inherent difference, in that you can't have multiple paths to NFS, without some kind of abstraction being done on the server side. Some arrays will do this by introducing some type of intermediary, but from the ESX side, you will again be using one pNIC to get there.

In a port channel, you again will have to specify the algorithm to use. If you are trunking over that port channel as well, then you will use your pNICs more efficiently, but that is because you have multiple IP hashes to work with and can split packets between the physical interfaces.

Hope that makes sense.

-KjB

vExpert/VCP/VCAP vmwise.com / @vmwise -KjB
Reply
0 Kudos
vmwareluverz
Contributor
Contributor
Jump to solution

any guides to setup nfs & netapp steps by steps or good best practices i can read.

Reply
0 Kudos
dpkenn
Contributor
Contributor
Jump to solution

Here are some useful documents from NetApp. Performance wise, FC still out performs iSCSI or NFS. How much better is the question and the decision to go that route -- you will have to make based on your environment.

Hope these documents are useful.

Thanks

Reply
0 Kudos
evilensky
Enthusiast
Enthusiast
Jump to solution

Performance-wise, is this due to almost always higher bandwidth capabilities of an FC SAN, or other factors?

Specifically, I am wondering, are the bandwidth / overhead of the three protocols reasonably comparable? For example, I have ESX hosts that are using dual-pathed 2Gb FCP ports, but according to the FC switch, no single port reaches more than 40 Mbps peak utilization.

If the bandwidth numbers are comparable, assuming we put the proper infrastructure in place with regards to reliability, from a throughput perspective, we could in theory consolidate these ESX servers to just a handful of 1 GigE ports without incurring degradation?

Our network team is great and I have no doubts that any IP-SAN / NFS infrastructure we ask them to build would have very low latency, but what is the relationship in the three protocols for bandwidth?

Thanks,

Reply
0 Kudos