has anyone use nfs for production virtual machines datastore? What are you thoughts regarding NetApp or alternative solutions? are there any reasons why netapp is good for nfs and recommended over iscsi/fc solution?
what is the best practices for setting up nfs via esx 3.5 and general solutions to make it work perfective with good performance. thank you for feedbacks.
There has been many conversation about NFS especially in NetApp, claimed to be really fast and good performance as well. NFS doesn't require to have existing fibre channel, switches and etc...and cost less to deploy since it use existing LAN infrastructure. If you're building new NFS solution, go with 10GB bandwidth if possible to prevent from future upgrade and Neterion seems to have special card you can use runs at 10GB. You can check out lots of good links www.vmware-land.com or VMworld best practices online www.vmworld.com and also read VMware best performance guide that would help alot. Alternative solutions could be iSCSI Dell Equallogic or Lefthand's Network check out for details.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
iGeek Systems Inc.
VMware, Citrix, Microsoft Consultant
There has been many conversation about NFS especially in NetApp, claimed to be really fast and good performance as well. NFS doesn't require to have existing fibre channel, switches and etc...and cost less to deploy since it use existing LAN infrastructure. If you're building new NFS solution, go with 10GB bandwidth if possible to prevent from future upgrade and Neterion seems to have special card you can use runs at 10GB. You can check out lots of good links www.vmware-land.com or VMworld best practices online www.vmworld.com and also read VMware best performance guide that would help alot. Alternative solutions could be iSCSI Dell Equallogic or Lefthand's Network check out for details.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
iGeek Systems Inc.
VMware, Citrix, Microsoft Consultant
Here is the best practices for NFS/NetApp.
If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Regards,
Stefan Nguyen
iGeek Systems Inc.
VMware, Citrix, Microsoft Consultant
We're running it in production now (changing over from 4gig FC). I am extremely pleased with the results. Not only do we get solid performance but it's much easier to provision both storage and new VMs, not to mention backing up is now trivial.
We are using a 3070 cluster with 10gig connection to the switch and then 2 1gb connections from each ESX host (~70 VMs per host).
do u have instructions to setup nfs in esx or how to implement it?
It's actually super strait forward. Here's some of the stuff copied from our internal wiki (stolen from various sources, sorry if I am not crediting people). It works with defaults but here is stuff you can look at. There is also a locking issue if you use VMware snapshots but we don't.
Netapp Specific:
When a cluster failover occurs on a NetApp FAS storage system, NFS
datastores within VMware ESX v3.x may go offline, requiring an
administrator to re-create the NFS datastores and connect them to the
FlexVols presented from the FAS storage system.
Cause of this problem The NFS client in VMware ESX v3.x has
several default settings, which relate to how long the effective
timeout period is for NFS volumes.
The default settings may not account for the length of time required
for a FAS system to perform a cluster failover. Solution
Using VMware Virtual Center, repeat this for each ESX server in the HA datacenter using NFS datastores:
1. Select the ESX server. Select the configuration tab. Select advanced settings. Select NFS.
2. There are 3 settings that need to be changed:
NFS.HeartbeatFrequency NFS.HeartbeatTimeout
NFS.HeartbeatMaxFailures The NFS client relies on Heartbeats to verify
the NFS volume is available. Increasing the NFS Heartbeat frequency
will ensure that datastore IO can resume much sooner once the cluster
failover has completed.
When considering the overall timeout period, it is necessary to understand how these 3 settings interact.
A simple formula that can be used is (X x Z) + Y = effective timeout period in seconds
Where:
X=NFS.HeartbeatFrequency
Y=NFS.HeartbeatTimeout
Z=NFS.HeartbeatMaxFailures
For example, in ESX v3.0.2, the default values are:
NFS.HeartbeatFrequency = 9 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 3 With NFS.HeartbeatFrequency set to 9, then a heartbeat will occur every 9 seconds.
Each heartbeat will wait for 5 seconds (the value of NFS.HeartbeatTimeout ) before timing out.
Every time a heartbeat time out occurs, the heartbeat failures count is increased.
Once the heartbeat failures count is equal to the value set with
NFS.HeartbeatMaxFailures, the datastore is marked as failed and taken
offline.
So the default settings in ESX v3.0.2 result in a heartbeat every 9 seconds. Each heartbeat has 5 seconds to succeed.
This means that the first heartbeat will begin at 9 seconds, and
fail at 14 seconds. The second hearbeat will begin at 18 seconds, and
fail at 23 seconds. The third heartbeat will begin at 27 seconds, and
fail at 32 seconds. At this point the max failures count will equal 3,
which is the value of NFS.HeartbeatMaxFailures
At this point, because the MaxFailures threshold has been
reached, the datastore is taken offline 32 seconds after the NetApp
cluster failover began.
Usually, a NetApp cluster failover operation will not complete
within 32 seconds, so to ensure NFS datastores can withstand NetApp
cluster failover operations, we can use the following settings:
NFS.HeartbeatFrequency = 12 NFS.HeartbeatTimeout = 5 NFS.HeartbeatMaxFailures = 10
This will provide an effective timeout period of 125 seconds (12 x 10) + 5 = 125 seconds.
If the NetApp cluster failover begins at 0 seconds, then the
first heartbeat will occur at 12 seconds. The first heartbeat will fail
at 17 seconds. The second heartbeat will begin at 24 seconds, and will
fail at 29 seconds. The third heartbeat will begin at 36 seconds and
fail at 41 seconds. *This will continue until the 10th heartbeat,
which will begin at 120 seconds, and fail at 125 seconds – although the
NetApp cluster failover should have completed by this time, so the 9th
or 10th heartbeat should not timeout.*
Go to the Configuration tab
Select Security Profile
Click Properties...
Scroll down to NFS Client and check the box.
Click "OK"
On a fresh install you may not see anything or you'll see the
internal disks on your ESX host. If you are connected to a Fibre
Channel network (being phased out in GSO) you may see the FC LUNs if
you are using HBA's that have previously been configured for this
network (we use soft zoning).
In the VI client go to your ESX host and select the Configuration tab, then select Networking.
Create an empty switch (No port groups),add your NICs to them.
ssh into the ESX server
esxcfg-vswitch -m 7500 vSwitchX (sets the vSwitch MTU)
esxcfg-vswitch -A NFS vSwitchX (adds a portgroup to the switch named NFS)
esxcfg-vmknic -a -i 172.1.2.X -n 255.255.0.0 -m 7500 NFS (sets the portgroups information)
esxcfg-vswitch -l (verifies that setup was correct)
Make sure the switch is set to IP Hash for load balancing
Go to the Summary page on the ESX host
Select the NFS storage group in the Datastore window and double click it
Right click and create a new folder
If it succeeds your server has the rights it needs to run a VM off that NFS mount, please delete the folder
Tune ESX (Step 6 is not currently required as we only have 7 data stores, but will be needed soon)
Open VirtualCenter.
Select an ESX host.
In the right pane, select the Configuration tab.
In the Software box, select Advanced Configuration.
In the pop-up window, select NFS in the left pane.
Change the value of NFS.MaxVolumes to 32. See Figure 4.
In the pop-up window, select Net in the left pane.
Change the value of Net.TcpIpHeapSize to 30.
Repeat for each ESX server
Make sure you enabled load balancing on the NFS vSwitch
ssh into the ESX server. See if you can vmkping the target
See if you can vmkping -s 7500 the target (uses a jumbo frame)
See if you can navigate to /vmfs/volumes and change into the NFS directory
Disable last access stamp
On the filer run vol options <vol-name> no_atime_update on
This will disable last access time stamps when accessing files via NFS.
When creating the VMKernels for NFS, did you use a different IP ADDRESS for each ESX host?
I'd recommend using a single address but using NIC teaming (EtherChannel) to aggragate bandwidth. That way you have failover at the network level. You probably want to do the same thing with the NICs on your NFS Server so if one goes down you don't loose all the VMs associated with it.
Great... now how do I do this in Lab Manager 2.5.3 ?? In Lab Manager you mount the NFS storage directory from within the GUI not from Virtual Center. I am having performance problems and need to adjust some settings. I would love to follow this guide but it seems I can't, since ESX didn't mount the storage, Lab Manage did.
Mike
Hi, the post above was excellent, however after implementing a timeout equal to 185 seconds and 245 seconds, respectively, as a test, my NFS datastores still become inaccessible , as well as the VMs on them, around 1 minute. Any other thoughts. BTW using 3.5i.
I'd call VMware for help, that does not sound right. NFS may not be the most effecient protocol for storage but it should be very reliable. Sounds like there is a network issue here.
Can you vmkping the NFS server non stop and see if there is an interuption? I am not familiar with imbedded ESX so there may be a gotcha there.
One extra setting (thanks to Scott Lowe and VMware) we had to do this on our systems but I didnt update the above post, you want to increase the amount of memory availible to your ESX host to 800 meg. We have large servers (64 gig x3850m2's) running ~45 VMs each and where having weird problems (vmotions taking a long time, storage vmotions failing/hanging, ESX hosts diconnecting in VI etc) until updating this setting.
Just a note that TR 3428 has changed quite a bit since it was posted in the thread above -- last modified in September 2008 (the post above has it from 2007) -- the NFS lock recommendation change is one of the most prominent items.
If downloading it, I'd just always pull it from the NetApp site to ensure you get the latest version.
http://www.netapp.com/us/library/technical-reports/tr-3428.html
And.....I'd highly recommend TR 3593 as well for storage alignment (this is good for any storage system....just different offsets depending on your storage vendor).
If you are looking at running VMware ESX 3.5 with NFS then have a look at this product:
We are currently evaluating this product which we loaded onto a SUN x4500 Thumper with 48 x 500GB SATA Drives and we are so far finding the performance to be pretty good.
Our current production systems run on an old IBM FAStT700 that is a 2GB Fibre,
We have replicated the production system on some new IBM x3650 Servers that are then connected to the Thumper over TEAMED iNTEL 1000 Adapters.
We have setup BOTH NSF and iSCSI on the Thumper and then connected our ESX Servers to it.
We have restored our FULL production system onto the new ESX Servers and the Thumper and are currently carrying out extensive testing.
Initial results show that the Nexenta with the Thumper and NSF outperforms our current 2GB Fibre setup by about 30%.
We are also about to test the iSCSI.
I highly recommend looking at Nexenta and keeping an eye on where this product is heading.
Have FUN!
Michael Pozar
For NetApp / VMware alignment best practices, we'd prefer that you refer to TR-3747 - http://media.netapp.com/documents/tr-3747.pdf. The other technical report is outdated.
I recommend you all to read this:
it has plenty of information of NFS and ESX
hope it is helpful for you all
regards
Jose Ruelas