VMware Cloud Community
hagr
Contributor
Contributor

Slow NFS datastores to FreeBSD server (might apply to Linux NFS servers, too)

Hi,

I've been trying to figure out where the bottleneck is for the past week on our system. It's the usual 5MB/sec performance on VMs running on an NFS datastore. It's not going up no matter how many connections to the NFS I try, nor with larger "payload" (using dd with different bs), either with VMs or using datastores in ESX to push more transfers. The NFS server is a tweaked FreeBSD 7.1 w/ZFS exported with nfs flags "-t -n 12". I've tested the same NFS share from within a FreeBSD client (VM on the same ESX), but using write/read packet size (nfs_mount -o w=1024 r=1024), and getting speeds excessing 25MB/sec I've also ran a few tests with iSCSI on the same servers. It gives apx. the same results as the VM mounting NFS directly (25MB/sec), but with a much larger network usage (51MB/sec).

My conclusion so far is that the ESX and the NFS server are not communicating well.

The ESX is an HP DL360 w/dual GigE, connected using static 802.3ad trunking with a ProCurve 54XXzl switch, which again is using an LACP trunk to the NFS server which is an HP DL380 w/MSA2000 w/dual GigE. Both the ESX and the NFS server are running jumbo frames (also enabled on the VLAN on the switch (AFAIK, this isn't supported from VMWare's side, but I'm doing this to try and figure out where the bottleneck is)).

I've been searching from the bottoms of the Internet to the top of my desk. I have found no information on what kind of parameters ESX 3.5 is using on it's NFS mounts. I'm also curious on how the NetApp storage solution implements this since I'm getting the impression that no one has any performance issues with them.

My question is; anybody out there with experience to ESX with NFS on *nix? I'm thinking either no one has ever bothered looking into this or they gave up without making a sound...

Am I right about ESX 3.5 using NFSv3 and TCP only? Are there any flags whatsoever that you can set (except for bufsize, async etc in VC -> Advanced -> NFS)? I'm thinking of packet sizes.

Does anyone know of an article describing ESX' NFS-connectivity more thoroughly?

Please, don't be afraid to give me a reply. I'm desperate, and any hint or tip is welcome.

For Mods: Please move this thread if you see fit. I'm new to the community forums, and this was my best guess. Smiley Happy

Reply
0 Kudos
3 Replies
dennispedersen7
Contributor
Contributor

First of all - im not the big *NIX guru, but i have noticed the same thing as you.

If it is a FreeBSD NFS server the preformance is poor.

Once i switched it out with an ubuntu and used the async parameter on the NFS share - im not getting 30+ megabytes/s to a my test nfs (just a plain HP DC 7900 workstation).

I never figured out what went wrong with FreeBSD - i too spent lots of time on google trying to solve it. Finnally i just gave up and installed ubuntu instead.

Reply
0 Kudos
natewilson
Contributor
Contributor

Did you ever come up with any improvements? I am having the same issue with FreeBSD/FreeNAS. When xferring from my Linux server, I get ~25MB/s. With ESX, it's more like 5MB/s.

Reply
0 Kudos
LucasAlbers
Expert
Expert

anecdotally this sounds like the difference in performance with caching enabled either via the nfs server or the on board disk cache.

If the client does not trust the server to cache, it would suggest this level of performance.

http://communities.vmware.com/thread/105216

from that discussion:[~46561] said:

"

I believe the ESX kernel mounts the NFS datastore with the "sync"

option, which tells the NFS server to make sure everything's

successfully committed to disk (not just in cache) before responding

that the write has succeeded. You can see this in a wireshark or

tcpdump trace of the NFS writes: they have the FILE_SYNC flag set.

Without this flag, the NFS server is able to respond "write completed"

as soon as the data is copied into the system memory. With FILE_SYNC,

there's an actual disk write operation before the ESX server can

continue on.

Now, in the case of dedicated file servers (NetApp filers or the like),

the hardware usually has a battery-backed RAM system so that it can

respond "write completed" as soon as the request is safely stored

there. On commodity systems (where you're running OpenFiler, for

example), that isn't available.

"

Reply
0 Kudos