Solved: Re: best practices for nfs and esx 3.5

vmwareluverz · ‎06-24-2008

has anyone use nfs for production virtual machines datastore? What are you thoughts regarding NetApp or alternative solutions? are there any reasons why netapp is good for nfs and recommended over iscsi/fc solution?

what is the best practices for setting up nfs via esx 3.5 and general solutions to make it work perfective with good performance. thank you for feedbacks.

azn2kew · ‎06-24-2008

There has been many conversation about NFS especially in NetApp, claimed to be really fast and good performance as well. NFS doesn't require to have existing fibre channel, switches and etc...and cost less to deploy since it use existing LAN infrastructure. If you're building new NFS solution, go with 10GB bandwidth if possible to prevent from future upgrade and Neterion seems to have special card you can use runs at 10GB. You can check out lots of good links www.vmware-land.com or VMworld best practices online www.vmworld.com and also read VMware best performance guide that would help alot. Alternative solutions could be iSCSI Dell Equallogic or Lefthand's Network check out for details.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA

View solution in original post

azn2kew · ‎06-24-2008

There has been many conversation about NFS especially in NetApp, claimed to be really fast and good performance as well. NFS doesn't require to have existing fibre channel, switches and etc...and cost less to deploy since it use existing LAN infrastructure. If you're building new NFS solution, go with 10GB bandwidth if possible to prevent from future upgrade and Neterion seems to have special card you can use runs at 10GB. You can check out lots of good links www.vmware-land.com or VMworld best practices online www.vmworld.com and also read VMware best performance guide that would help alot. Alternative solutions could be iSCSI Dell Equallogic or Lefthand's Network check out for details.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA

azn2kew · ‎06-24-2008

Here is the best practices for NFS/NetApp.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

Regards,

Stefan Nguyen

iGeek Systems Inc.

VMware, Citrix, Microsoft Consultant

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!! Regards, Stefan Nguyen VMware vExpert 2009 iGeek Systems Inc. VMware vExpert, VCP 3 & 4, VSP, VTSP, CCA, CCEA, CCNA, MCSA, EMCSE, EMCISA

jeremypage · ‎06-27-2008

We're running it in production now (changing over from 4gig FC). I am extremely pleased with the results. Not only do we get solid performance but it's much easier to provision both storage and new VMs, not to mention backing up is now trivial.

We are using a 3070 cluster with 10gig connection to the switch and then 2 1gb connections from each ESX host (~70 VMs per host).

vmwareluverz · ‎06-27-2008

do u have instructions to setup nfs in esx or how to implement it?

jeremypage · ‎06-27-2008

It's actually super strait forward. Here's some of the stuff copied from our internal wiki (stolen from various sources, sorry if I am not crediting people). It works with defaults but here is stuff you can look at. There is also a locking issue if you use VMware snapshots but we don't.

Netapp Specific:

When a cluster failover occurs on a NetApp FAS storage system, NFS

datastores within VMware ESX v3.x may go offline, requiring an

administrator to re-create the NFS datastores and connect them to the

FlexVols presented from the FAS storage system.

Cause of this problem The NFS client in VMware ESX v3.x has

several default settings, which relate to how long the effective

timeout period is for NFS volumes.

The default settings may not account for the length of time required

for a FAS system to perform a cluster failover. Solution

Using VMware Virtual Center, repeat this for each ESX server in the HA datacenter using NFS datastores:

1. Select the ESX server. Select the configuration tab. Select advanced settings. Select NFS.

2. There are 3 settings that need to be changed:

NFS.HeartbeatFrequency NFS.HeartbeatTimeout

NFS.HeartbeatMaxFailures The NFS client relies on Heartbeats to verify

the NFS volume is available. Increasing the NFS Heartbeat frequency

will ensure that datastore IO can resume much sooner once the cluster

failover has completed.

When considering the overall timeout period, it is necessary to understand how these 3 settings interact.

A simple formula that can be used is (X x Z) + Y = effective timeout period in seconds

Where:

X=NFS.HeartbeatFrequency

Y=NFS.HeartbeatTimeout

Z=NFS.HeartbeatMaxFailures

For example, in ESX v3.0.2, the default values are:

NFS.HeartbeatFrequency = 9 
NFS.HeartbeatTimeout = 5 
NFS.HeartbeatMaxFailures = 3 
With NFS.HeartbeatFrequency set to 9, then a heartbeat will occur every 9 seconds.

Each heartbeat will wait for 5 seconds (the value of NFS.HeartbeatTimeout ) before timing out.

Every time a heartbeat time out occurs, the heartbeat failures count is increased.

Once the heartbeat failures count is equal to the value set with

NFS.HeartbeatMaxFailures, the datastore is marked as failed and taken

offline.

So the default settings in ESX v3.0.2 result in a heartbeat every 9 seconds. Each heartbeat has 5 seconds to succeed.

This means that the first heartbeat will begin at 9 seconds, and

fail at 14 seconds. The second hearbeat will begin at 18 seconds, and

fail at 23 seconds. The third heartbeat will begin at 27 seconds, and

fail at 32 seconds. At this point the max failures count will equal 3,

which is the value of NFS.HeartbeatMaxFailures

At this point, because the MaxFailures threshold has been

reached, the datastore is taken offline 32 seconds after the NetApp

cluster failover began.

Usually, a NetApp cluster failover operation will not complete

within 32 seconds, so to ensure NFS datastores can withstand NetApp

cluster failover operations, we can use the following settings:

NFS.HeartbeatFrequency = 12 
NFS.HeartbeatTimeout = 5 
NFS.HeartbeatMaxFailures = 10

This will provide an effective timeout period of 125 seconds (12 x 10) + 5 = 125 seconds.

If the NetApp cluster failover begins at 0 seconds, then the

first heartbeat will occur at 12 seconds. The first heartbeat will fail

at 17 seconds. The second heartbeat will begin at 24 seconds, and will

fail at 29 seconds. The third heartbeat will begin at 36 seconds and

fail at 41 seconds. *This will continue until the 10th heartbeat,

which will begin at 120 seconds, and fail at 125 seconds – although the

NetApp cluster failover should have completed by this time, so the 9th

or 10th heartbeat should not timeout.*

Opening the firewall for NFS client traffic

Go to the Configuration tab
Select Security Profile
Click Properties...
Scroll down to NFS Client and check the box.
Click "OK"

[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=2|Edit section: Adding the NFS stores]] Adding the NFS stores

On a fresh install you may not see anything or you'll see the

internal disks on your ESX host. If you are connected to a Fibre

Channel network (being phased out in GSO) you may see the FC LUNs if

you are using HBA's that have previously been configured for this

network (we use soft zoning).

In the VI client go to your ESX host and select the Configuration tab, then select Networking.
Create an empty switch (No port groups),add your NICs to them.
ssh into the ESX server
esxcfg-vswitch -m 7500 vSwitchX (sets the vSwitch MTU)
esxcfg-vswitch -A NFS vSwitchX (adds a portgroup to the switch named NFS)
esxcfg-vmknic -a -i 172.1.2.X -n 255.255.0.0 -m 7500 NFS (sets the portgroups information)
esxcfg-vswitch -l (verifies that setup was correct)
Make sure the switch is set to IP Hash for load balancing

[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=3|Edit section: Verifying NFS access]] Verifying NFS access

Go to the Summary page on the ESX host
Select the NFS storage group in the Datastore window and double click it
Right click and create a new folder
If it succeeds your server has the rights it needs to run a VM off that NFS mount, please delete the folder

[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=4|Edit section: Tuning NFS]] Tuning NFS

Tune ESX (Step 6 is not currently required as we only have 7 data stores, but will be needed soon)

Open VirtualCenter.
Select an ESX host.
In the right pane, select the Configuration tab.
In the Software box, select Advanced Configuration.
In the pop-up window, select NFS in the left pane.
Change the value of NFS.MaxVolumes to 32. See Figure 4.
In the pop-up window, select Net in the left pane.
Change the value of Net.TcpIpHeapSize to 30.
Repeat for each ESX server

[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=5|Edit section: Troubleshooting NFS problems]] Troubleshooting NFS problems

Make sure you enabled load balancing on the NFS vSwitch
ssh into the ESX server. See if you can vmkping the target
See if you can vmkping -s 7500 the target (uses a jumbo frame)
See if you can navigate to /vmfs/volumes and change into the NFS directory

Disable last access stamp

On the filer run vol options <vol-name> no_atime_update on

This will disable last access time stamps when accessing files via NFS.

ikefuentes · ‎07-16-2008

When creating the VMKernels for NFS, did you use a different IP ADDRESS for each ESX host?

mbrown2010 · ‎07-21-2008

You have to... otherwise you will have IP conflicts on your network.

You can however create multiple IP's on your NFS datastore and setup multiple maps on your ESX Hosts to gain more bandwidth.

Matt Brown

jeremypage · ‎07-21-2008

I'd recommend using a single address but using NIC teaming (EtherChannel) to aggragate bandwidth. That way you have failover at the network level. You probably want to do the same thing with the NICs on your NFS Server so if one goes down you don't loose all the VMs associated with it.

mfarace · ‎08-08-2008

Great... now how do I do this in Lab Manager 2.5.3 ?? In Lab Manager you mount the NFS storage directory from within the GUI not from Virtual Center. I am having performance problems and need to adjust some settings. I would love to follow this guide but it seems I can't, since ESX didn't mount the storage, Lab Manage did.

Mike

msbooth · ‎08-09-2008

Hi, the post above was excellent, however after implementing a timeout equal to 185 seconds and 245 seconds, respectively, as a test, my NFS datastores still become inaccessible , as well as the VMs on them, around 1 minute. Any other thoughts. BTW using 3.5i.

jeremypage · ‎08-11-2008

I'd call VMware for help, that does not sound right. NFS may not be the most effecient protocol for storage but it should be very reliable. Sounds like there is a network issue here.

Can you vmkping the NFS server non stop and see if there is an interuption? I am not familiar with imbedded ESX so there may be a gotcha there.

jeremypage · ‎11-11-2008

One extra setting (thanks to Scott Lowe and VMware) we had to do this on our systems but I didnt update the above post, you want to increase the amount of memory availible to your ESX host to 800 meg. We have large servers (64 gig x3850m2's) running ~45 VMs each and where having weird problems (vmotions taking a long time, storage vmotions failing/hanging, ESX hosts diconnecting in VI etc) until updating this setting.

andriven · ‎11-11-2008

Just a note that TR 3428 has changed quite a bit since it was posted in the thread above -- last modified in September 2008 (the post above has it from 2007) -- the NFS lock recommendation change is one of the most prominent items.

If downloading it, I'd just always pull it from the NetApp site to ensure you get the latest version.

http://www.netapp.com/us/library/technical-reports/tr-3428.html

And.....I'd highly recommend TR 3593 as well for storage alignment (this is good for any storage system....just different offsets depending on your storage vendor).

http://media.netapp.com/documents/tr_3593.pdf

mpozar · ‎11-11-2008

If you are looking at running VMware ESX 3.5 with NFS then have a look at this product:

www.nexenta.com

We are currently evaluating this product which we loaded onto a SUN x4500 Thumper with 48 x 500GB SATA Drives and we are so far finding the performance to be pretty good.

Our current production systems run on an old IBM FAStT700 that is a 2GB Fibre,

We have replicated the production system on some new IBM x3650 Servers that are then connected to the Thumper over TEAMED iNTEL 1000 Adapters.

We have setup BOTH NSF and iSCSI on the Thumper and then connected our ESX Servers to it.

We have restored our FULL production system onto the new ESX Servers and the Thumper and are currently carrying out extensive testing.

Initial results show that the Nexenta with the Thumper and NSF outperforms our current 2GB Fibre setup by about 30%.

We are also about to test the iSCSI.

I highly recommend looking at Nexenta and keeping an eye on where this product is heading.

Have FUN!

Michael Pozar

bgracely · ‎06-19-2009

For NetApp / VMware alignment best practices, we'd prefer that you refer to TR-3747 - http://media.netapp.com/documents/tr-3747.pdf. The other technical report is outdated.

jbruelasdgo · ‎06-19-2009

I recommend you all to read this:

http://virtualgeek.typepad.com/virtual_geek/2009/06/a-multivendor-post-to-help-our-mutual-nfs-custom...

it has plenty of information of NFS and ESX

hope it is helpful for you all

regards

Jose Ruelas

Jose B Ruelas http://aservir.wordpress.com

All

best practices for nfs and esx 3.5

<span class="mw-headline">Opening the firewall for NFS client traffic

<span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=2|Edit section: Adding the NFS stores]] <span class="mw-headline"> Adding the NFS stores

<span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=3|Edit section: Verifying NFS access]] <span class="mw-headline"> Verifying NFS access

<span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=4|Edit section: Tuning NFS]] <span class="mw-headline"> Tuning NFS

<span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=5|Edit section: Troubleshooting NFS problems]] <span class="mw-headline"> Troubleshooting NFS problems