1 2 Previous Next 16 Replies Latest reply on Jun 19, 2009 10:51 AM by jbruelasdgo

    best practices for nfs and esx 3.5

    vmwareluverz Hot Shot

       

      has anyone use nfs for production virtual machines datastore?  What are you thoughts regarding NetApp or alternative solutions?  are there any reasons why netapp is good for nfs and recommended over iscsi/fc solution?

       

       

      what is the best practices for setting up nfs via esx 3.5 and general solutions to make it work perfective with good performance. thank you for feedbacks.

       

       

        • 1. Re: best practices for nfs and esx 3.5
          azn2kew Champion

          There has been many conversation about NFS especially in NetApp, claimed to be really fast and good performance as well.  NFS doesn't require to have existing fibre channel, switches and etc...and cost less to deploy since it use existing LAN infrastructure.  If you're building new NFS solution, go with 10GB bandwidth if possible to prevent from future upgrade and Neterion seems to have special card you can use runs at 10GB.  You can check out lots of good links www.vmware-land.com or VMworld best practices online www.vmworld.com and also read VMware best performance guide that would help alot.  Alternative solutions could be iSCSI Dell Equallogic or Lefthand's Network check out for details.

           

          If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

           

          Regards,

           

          Stefan Nguyen

          iGeek Systems Inc.

          VMware, Citrix, Microsoft Consultant

          • 2. Re: best practices for nfs and esx 3.5
            azn2kew Champion

            Here is the best practices for NFS/NetApp.

             

            If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!

             

            Regards,

             

            Stefan Nguyen

            iGeek Systems Inc.

            VMware, Citrix, Microsoft Consultant

            1 person found this helpful
            • 3. Re: best practices for nfs and esx 3.5
              jeremypage Hot Shot

               

              We're running it in production now (changing over from 4gig FC). I am extremely pleased with the results. Not only do we get solid performance but it's much easier to provision both storage and new VMs, not to mention backing up is now trivial.

               

               

               

               

               

              We are using a 3070 cluster with 10gig connection to the switch and then 2 1gb connections from each ESX host (~70 VMs per host).

               

               

              1 person found this helpful
              • 4. Re: best practices for nfs and esx 3.5
                vmwareluverz Hot Shot

                do u have instructions to setup nfs in esx or how to implement it?

                • 5. Re: best practices for nfs and esx 3.5
                  jeremypage Hot Shot

                   

                  It's actually super strait forward. Here's some of the stuff copied from our internal wiki (stolen from various sources, sorry if I am not crediting people). It works with defaults but here is stuff you can look at. There is also a locking issue if you use VMware snapshots but we don't. 

                   

                   

                  Netapp Specific:

                   

                   

                  When a cluster failover occurs on a NetApp FAS storage system, NFS

                  datastores within VMware ESX v3.x may go offline, requiring an

                  administrator to re-create the NFS datastores and connect them to the

                  FlexVols presented from the FAS storage system.

                   

                   

                   

                  Cause of this problem The NFS client in VMware ESX v3.x has

                  several default settings, which relate to how long the effective

                  timeout period is for NFS volumes.

                  The default settings may not account for the length of time required

                  for a FAS system to perform a cluster failover. Solution

                   

                   

                  Using VMware Virtual Center, repeat this for each ESX server in the HA datacenter using NFS datastores:

                   

                   

                  1.     Select the ESX server. Select the configuration tab. Select advanced settings. Select NFS.

                   

                   

                  2.     There are 3 settings that need to be changed:

                   

                   

                  NFS.HeartbeatFrequency NFS.HeartbeatTimeout

                  NFS.HeartbeatMaxFailures The NFS client relies on Heartbeats to verify

                  the NFS volume is available. Increasing the NFS Heartbeat frequency

                  will ensure that datastore IO can resume much sooner once the cluster

                  failover has completed.

                   

                   

                  When considering the overall timeout period, it is necessary to understand how these 3 settings interact.

                   

                   

                  A simple formula that can be used is  (X x Z) + Y = effective timeout period in seconds

                   

                   

                  Where:

                   

                   

                  X=NFS.HeartbeatFrequency

                   

                   

                  Y=NFS.HeartbeatTimeout

                   

                   

                  Z=NFS.HeartbeatMaxFailures

                   

                   

                   

                   

                   

                  For example, in ESX v3.0.2, the default values are:

                   

                  NFS.HeartbeatFrequency = 9 
                  NFS.HeartbeatTimeout = 5 
                  NFS.HeartbeatMaxFailures = 3 
                  With NFS.HeartbeatFrequency set to 9, then a heartbeat will occur every 9 seconds.
                  

                   

                  Each heartbeat will wait for 5 seconds (the value of NFS.HeartbeatTimeout ) before timing out.

                   

                   

                  Every time a heartbeat time out occurs, the heartbeat failures count is increased.

                   

                   

                  Once the heartbeat failures count is equal to the value set with

                  NFS.HeartbeatMaxFailures, the datastore is marked as failed and taken

                  offline.

                   

                   

                  So the default settings in ESX v3.0.2 result in a heartbeat every 9 seconds. Each heartbeat has 5 seconds to succeed.

                   

                   

                  This means that the first heartbeat will begin at 9 seconds, and

                  fail at 14 seconds. The second hearbeat will begin at 18 seconds, and

                  fail at 23 seconds. The third heartbeat will begin at 27 seconds, and

                  fail at 32 seconds. At this point the max failures count will equal 3,

                  which is the value of NFS.HeartbeatMaxFailures

                   

                   

                  At this point, because the MaxFailures threshold has been

                  reached, the datastore is taken offline 32 seconds after the NetApp

                  cluster failover began.

                   

                   

                  Usually, a NetApp cluster failover operation will not complete

                  within 32 seconds, so to ensure NFS datastores can withstand NetApp

                  cluster failover operations, we can use the following settings:

                   

                  NFS.HeartbeatFrequency = 12 
                  NFS.HeartbeatTimeout = 5 
                  NFS.HeartbeatMaxFailures = 10 
                  

                   

                  This will provide an effective timeout period of 125 seconds (12 x 10) + 5 = 125 seconds.

                   

                   

                  If the NetApp cluster failover begins at 0 seconds, then the

                  first heartbeat will occur at 12 seconds. The first heartbeat will fail

                  at 17 seconds. The second heartbeat will begin at 24 seconds, and will

                  fail at 29 seconds. The third heartbeat will begin at 36 seconds and

                  fail at 41 seconds. *This will continue until the 10th heartbeat,

                  which will begin at 120 seconds, and fail at 125 seconds – although the

                  NetApp cluster failover should have completed by this time, so the 9th

                  or 10th heartbeat should not timeout.*

                   

                   

                   

                   

                  <span class="mw-headline">Opening the firewall for NFS client traffic

                  • Go to the Configuration tab

                  • Select Security Profile

                  • Click Properties...

                  • Scroll down to NFS Client and check the box.

                  • Click "OK"

                   

                  <span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=2|Edit section: Adding the NFS stores]] <span class="mw-headline"> Adding the NFS stores

                   

                   

                  On a fresh install you may not see anything or you'll see the

                  internal disks on your ESX host. If you are connected to a Fibre

                  Channel network (being phased out in GSO) you may see the FC LUNs if

                  you are using HBA's that have previously been configured for this

                  network (we use soft zoning).

                   

                  • In the VI client go to your ESX host and select the Configuration tab, then select Networking.

                  • Create an empty switch (No port groups),add your NICs to them.

                  • ssh into the ESX server

                  • esxcfg-vswitch -m 7500 vSwitchX                     (sets the vSwitch MTU)

                  • esxcfg-vswitch -A NFS vSwitchX                     (adds a portgroup to the switch named NFS)

                  • esxcfg-vmknic -a -i 172.1.2.X -n 255.255.0.0 -m 7500 NFS      (sets the portgroups information)

                  • esxcfg-vswitch -l                               (verifies that setup was correct)

                  • Make sure the switch is set to IP Hash for load balancing

                   

                  <span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=3|Edit section: Verifying NFS access]] <span class="mw-headline"> Verifying NFS access

                  • Go to the Summary page on the ESX host

                  • Select the NFS storage group in the Datastore window and double click it

                  • Right click and create a new folder

                  • If it succeeds your server has the rights it needs to run a VM off that NFS mount, please delete the folder

                   

                  <span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=4|Edit section: Tuning NFS]] <span class="mw-headline"> Tuning NFS

                  • Tune ESX (Step 6 is not currently required as we only have 7 data stores, but will be needed soon)

                  1. Open VirtualCenter.

                  2. Select an ESX host.

                  3. In the right pane, select the Configuration tab.

                  4. In the Software box, select Advanced Configuration.

                  5. In the pop-up window, select NFS in the left pane.

                  6. Change the value of NFS.MaxVolumes to 32. See Figure 4.

                  7. In the pop-up window, select Net in the left pane.

                  8. Change the value of Net.TcpIpHeapSize to 30.

                  9. Repeat for each ESX server

                   

                  <span class="editsection">[[edit|http://gsovmwiki01/mediawiki/index.php?title=Configuring_NFS_storage&action=edit&section=5|Edit section: Troubleshooting NFS problems]] <span class="mw-headline"> Troubleshooting NFS problems

                  • Make sure you enabled load balancing on the NFS vSwitch

                  • ssh into the ESX server. See if you can vmkping the target

                  • See if you can vmkping -s 7500 the target (uses a jumbo frame)

                  • See if you can navigate to /vmfs/volumes and change into the NFS directory

                   

                  • Disable last access stamp

                  On the filer run vol options &lt;vol-name&gt; no_atime_update on

                  This will disable last access time stamps when accessing files via NFS.

                  • 6. Re: best practices for nfs and esx 3.5
                    ikefuentes Novice

                    When creating the VMKernels for NFS, did you use a different IP ADDRESS for each ESX host?

                    • 7. Re: best practices for nfs and esx 3.5
                      mbrown2010 Novice

                       

                      You have to... otherwise you will have IP conflicts on your network.

                       

                       

                      You can however create multiple IP's on your NFS datastore and setup multiple maps on your ESX Hosts to gain more bandwidth.

                       

                       

                      Matt Brown

                      http://universitytechnology.blogspot.com 

                       

                       

                      • 8. Re: best practices for nfs and esx 3.5
                        jeremypage Hot Shot

                        I'd recommend using a single address but using NIC teaming (EtherChannel) to aggragate bandwidth. That way you have failover at the network level. You probably want to do the same thing with the NICs on your NFS Server so if one goes down you don't loose all the VMs associated with it.

                        • 9. Re: best practices for nfs and esx 3.5
                          mfarace Novice

                           

                          Great... now how do I do this in Lab Manager 2.5.3 ??  In Lab Manager you mount the NFS storage directory from within the GUI not from Virtual Center.  I am having performance problems and need to adjust some settings.  I would love to follow this guide but it seems I can't, since ESX didn't mount the storage, Lab Manage did.

                           

                           

                          Mike

                           

                           

                          • 10. Re: best practices for nfs and esx 3.5
                            msbooth Novice

                            Hi, the post above was excellent, however after implementing a timeout equal to 185 seconds and 245 seconds, respectively, as a test, my NFS datastores still become inaccessible , as well as the VMs on them, around 1 minute.  Any other thoughts.  BTW using 3.5i.

                            • 11. Re: best practices for nfs and esx 3.5
                              jeremypage Hot Shot

                               

                              I'd call VMware for help, that does not sound right. NFS may not be the most effecient protocol for storage but it should be very reliable. Sounds like there is a network issue here.

                               

                               

                              Can you vmkping the NFS server non stop and see if there is an interuption? I am not familiar with imbedded ESX so there may be a gotcha there.

                               

                               

                              • 12. Re: best practices for nfs and esx 3.5
                                jeremypage Hot Shot

                                One extra setting (thanks to Scott Lowe and VMware) we had to do this on our systems but I didnt update the above post, you want to increase the amount of memory availible to your ESX host to 800 meg. We have large servers (64 gig x3850m2's) running ~45 VMs each and where having weird problems (vmotions taking a long time, storage vmotions failing/hanging, ESX hosts diconnecting in VI etc) until updating this setting.

                                • 13. Re: best practices for nfs and esx 3.5
                                  andriven Enthusiast
                                  vExpert

                                  Just a note that TR 3428 has changed quite a bit since it was posted in the thread above -- last modified in September 2008 (the post above has it from 2007) -- the NFS lock recommendation change is one of the most prominent items.

                                   

                                  If downloading it, I'd just always pull it from the NetApp site to ensure you get the latest version.

                                   

                                  http://www.netapp.com/us/library/technical-reports/tr-3428.html

                                   

                                  And.....I'd highly recommend TR 3593 as well for storage alignment (this is good for any storage system....just different offsets depending on your storage vendor).

                                   

                                  http://media.netapp.com/documents/tr_3593.pdf

                                  • 14. Re: best practices for nfs and esx 3.5
                                    mpozar Enthusiast

                                     

                                    If you are looking at running VMware ESX 3.5 with NFS then have a look at this product:

                                     

                                     

                                           www.nexenta.com

                                     

                                     

                                     

                                     

                                     

                                    We are currently evaluating this product which we loaded onto a SUN x4500 Thumper with 48 x 500GB SATA Drives and we are so far finding the performance to be pretty good.

                                     

                                     

                                    Our current production systems run on an old IBM FAStT700 that is a 2GB Fibre, 

                                     

                                     

                                    We have replicated the production system on some new IBM x3650 Servers that are then connected to the Thumper over TEAMED iNTEL 1000 Adapters.

                                     

                                     

                                    We have setup BOTH NSF and iSCSI on the Thumper and then connected our ESX Servers to it.

                                     

                                     

                                    We have restored our FULL production system onto the new ESX Servers and the Thumper and are currently carrying out extensive testing.

                                     

                                     

                                    Initial results show that the Nexenta with the Thumper and NSF outperforms our current 2GB Fibre setup by about 30%.

                                     

                                     

                                    We are also about to test the iSCSI.

                                     

                                     

                                    I highly recommend looking at Nexenta and keeping an eye on where this product is heading.

                                     

                                     

                                     

                                     

                                     

                                    Have FUN!

                                     

                                     

                                     

                                     

                                     

                                    Michael Pozar

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                     

                                    1 2 Previous Next