Best Practices for NFS Datastore

davewelsh · ‎08-17-2016

I'm trying to get some advice on how to best set up an NFS server to use with ESXi as a datastore. I took a stab at it with CentOS 7, but the performance is abysmal. I'm hoping someone can point out some optimization that I've overlooked, but I'm open to trying another free OS as well.

I have an old Dell PowerEdge T310 with a SAS 6i/r hard drive controller. I have two 2 TB hard drives and two 1 TB hard drives. Due to the limitations of the SAS 6i/r controller, I have left the drives independent and went with software RAID 1 + LVM to get 3 TB of usable space like this:

# mdadm --create /dev/md0 --run --level=1 --raid-devices=2 /dev/sdd /dev/sde

# mdadm --create /dev/md1 --run --level=1 --raid-devices=2 /dev/sdf /dev/sdg

# vgcreate vg0 /dev/md0 /dev/md1

# lvcreate -l 100%VG -n lv0 vg0

Then I formatted the new LVM partition with XFS:

# mkfs.xfs /dev/vg0/lv0

I mounted this at /var/nfs and exported it with the following options:

# cat /etc/exports

/var/nfs 192.168.10.3(rw,no_root_squash,sync)

I was able to add this to my ESXi host using the vSphere Client as a new datastore called nfs01.

I then edited my VM through the vCenter web interface, adding a new 2.73 TB disk.

The guest OS is Windows Server 2012. Through the Disk Management interface, I initialized the disk GPT and created a new volume. This took several minutes. Then I tried quick formatting the volume with NTFS. I cancelled this after about 4 hours. I then shrunk the volume to 100 MB and formatted that instead. That succeeded after several minutes, but just creating a blank text document on this drive takes about 8 seconds.

The NFS server is plugged into the same gigabit switch as the ESXi server. Here are the ping times:

~ # vmkping nfs.qc.local

PING nfs.qc.local (192.168.10.20): 56 data bytes

64 bytes from 192.168.10.20: icmp_seq=0 ttl=64 time=0.269 ms

64 bytes from 192.168.10.20: icmp_seq=1 ttl=64 time=0.407 ms

64 bytes from 192.168.10.20: icmp_seq=2 ttl=64 time=0.347 ms

I ran an I/O benchmark tool and got these results: Imgur: The most awesome images on the Internet

At the same time vCenter showed this performance data for the datastore: Imgur: The most awesome images on the Internet

I noticed that some I/O operations done locally on the NFS server are also slow. For example I can run "touch x" and it completes instantly, but if I run "echo 'Hello World' > x" it can take anywhere from 0 to 8 seconds to complete.

This is my first attempt at using NFS (my two ESXi hosts use local storage) so I'm not sure if any of this is normal.

davewelsh · ‎08-18-2016

I figured out what was causing my issue: I didn't initialize the software RAID with the --asume-clean option. My arrays were resyncing the whole time.

My new virtual disk is now performing as expected, although I'd still be interesting in hearing people's opinions on optimizing the setup.

AveryFreeman · ‎11-02-2017

Edit: There is a wiki page about using --assume-clean that says it ignores resyncing the devices and is likely to lead to data corruption (especially in parity raid configs). Rather than using the --assume-clean flag, if you are having performance issues shortly after creating the array, check # cat /proc/mdstat and wait for your mirror to finish syncing before you benchmark the pool.

Reference: Initial Array Creation - Linux Raid Wiki

This has been an interesting read and I wish other people had also responded to it:

I am curious about the differences between XFS and ZFS in the context of local VM datastores - primarily NFS/file-level, since it doesn't have as many issues in the long run as iSCSI.

I have no problem finding references, threads, blog posts, etc. discussing the use of ZFS for local VM storage, but I can't really find any for XFS.

Why'd you decide to use XFS instead of ZFS? I think I read somewhere in passing that it is really good for VM storage due to its high IO serialization, but I wonder how that compares to ZFS.

ZFS benefits from its high use of memory for ARC - I get over 1.2GB/s sequential read rates for a Windows 10 VM in CrystalDiskMark - but its write rates are abysmal with sync=always, at about 70MB/s. I was thinking of adding an NVMe L2ARC and SLOG (one partitioned SM953), but I was hoping to get some data about XFS before I commit.

Another issue is ZFS requires a lot of memory which could be better utilized for more VMs. Also, if you want the latest version of ZFS, v37, there is only one OS that supports it, which is Solaris 11.3, which is honestly a nicely laid-out operating system, but I am fairly certain that development for it is dead, and I'm scared of being locked into a dying platform. Other ZFS-capable operating systems don't have the latest version, but use an aging v28 ported from OpenSolaris (Solaris 10), which is about 10-12 years old now. XFS was created around 1991 by SGI - do you know if RHEL is still developing XFS since it is their primary file system?

Why CentOS 7? It seems like the kernel they package with it is really, really dated, and you might be missing out on a lot of improvements...

Edit: I am setting up a new array using mdadm and XFS on Ubuntu 16.04LTS that I have benchmarked using ZFS-on-Linux (Ubuntu 16.04LTS), ZFS using Solaris 11.3, FreeBSD 11.1, and FreeNAS 11-U4. I'll report back when I get some results

The two drives are He8 8tb drives, but they're still syncing (8tb on these disks takes about 11 hours). But so far I have:

# dd if=/dev/zero of=/dev/md0 bs=8k count=120k

122880+0 records in

122880+0 records out

1006632960 bytes (1.0 GB, 960 MiB) copied, 2.24109 s, 449 MB/s

Seems about right for a dual-drive mirror of disks that read around 200 MB/s each.

All

Best Practices for NFS Datastore