Solved: Poor ESXi 4 NFS Datastore Performance with Various...

obstmassey · ‎04-08-2010

Hello!

In testing, I have found that I get between one half and one quarter of the I/O performance inside a guest when the ESXi 4 systems connect to the datastore using NFS than if the guests connect to the exact same NFS share. However, I do not see this effect if the datastore uses either iSCSI or local storage. This has been reproduced with different systems running ESXi 4 and different NAS systems.

My testing is very simple. I created a bare minimum CentOS 5.4 installation (fully updated as of 2010/04/07) with VMware Tools loaded, and time the creation of a 256MB file using DD. I create the file on the root partition (a VMDK stored in various datastores), or on a directory from the NAS mounted via NFS directly into the guest

My primary test configuration consits of a single test PC (Intel 3.0GHz Core2 Duo E8400 CPU with a single Intel 82567LM-3 Gigabit NC and 4GB RAM) running ESXi 4 connected to a HP Procurve 1810-24G, which is connected to a VIA EPIA-M700 NAS system running OpenFiler 2.3 with two 1.5TB 7200RPM SATA disks configured for software RAID 1 and dual bonded Gigabit Ethernet NICs. However, I have reproduced this with different ESXi PC's and different NAS systems.

Here is an output from one of the tests. In this case, the VMDK's are in a datastore stored on the NAS via NFS:

-

~~root@iridium /~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 0.524939 seconds, 511 MB/s
real 0m38.660s
user 0m0.000s
sys 0m0.566s
~~root@iridium /~~# mount 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines /mnt
~~root@iridium /~~# cd /mnt
~~root@iridium mnt~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 8.69747 seconds, 30.9 MB/s
real 0m9.060s
user 0m0.001s
sys 0m0.659s
~~root@iridium mnt~~#

-

The first dd is to a VMDK stored in a datastore connected via NFS. The dd completes almost immediately, but the sync takes almost 40 seconds! That's less than 7MB per second transfer rate: very slow. Then, I mount the exact same NFS share that ESXi is using for the datastore directly into the guest and repeat the dd. As you can see, the dd takes longer and the sync takes no real time (as it should for a NFS share with sync enabled), and the entire process takes less than 10 seconds: it's four times faster!

I only see these results on datastores mounted via NFS. For example, here is a test run on the same guest running from a datastore mounted via iSCSI (using the exact same NAS):

-

~~root@iridium /~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 1.6913 seconds, 159 MB/s
real 0m7.745s
user 0m0.000s
sys 0m1.043s
~~root@iridium /~~# mount 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines /mnt
~~root@iridium /~~# cd /mnt
~~root@iridium mnt~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 8.66534 seconds, 31.0 MB/s
real 0m9.081s
user 0m0.001s
sys 0m0.794s
~~root@iridium mnt~~#

-

And the same guest running from the internal SATA drive of the ESXi PC:

-

~~root@iridium /~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 6.77451 seconds, 39.6 MB/s
real 0m7.631s
user 0m0.002s
sys 0m0.751s
~~root@iridium /~~# mount 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines /mnt
~~root@iridium /~~# cd /mnt
~~root@iridium mnt~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 8.90374 seconds, 30.1 MB/s
real 0m9.208s
user 0m0.001s
sys 0m0.329s
~~root@iridium mnt~~#

-

As you can see, the direct guest NFS performance for all three is very consistent. The iSCSI and local disk datastore performance are both slightly better than this--as I would expect. But the datastore mounted via NFS gets only a fraction of the perfomance of any of these. Obviously, something is wrong.

I have been able to reproduce this effect with an Iomega Ix4-200d as well. The difference is not as dramatic, but still sizeable~~and consistent. Here is a test from a CentOS guest using a VMDK stored in a datastore provided by an Ix4-200d via NFS:~~---

~~root@palladium /~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 11.1253 seconds, 24.1 MB/s
real 0m18.350s
user 0m0.006s
sys 0m2.687s
~~root@palladium /~~# mount 172.20.19.1:/nfs/VirtualMachines /mnt
~~root@palladium /~~# cd /mnt
~~root@palladium mnt~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }
2560 records in
2560 records out
268435456 bytes (268 MB) copied, 9.91849 seconds, 27.1 MB/s
real 0m10.088s
user 0m0.002s
sys 0m2.147s root@palladium mnt--#

-

Once again, the direct NFS mount gives very consistent results. But using the disk provided by ESXi on a NFS mounted datastore gives consistently worse results. They're not as terrible as the OpenFiler test results, but they are consistently between 60% and 100% longer.

Why is this? From what I've read, NFS performace is supposed to be within a few percent of iSCSI performance, yet I'm seeing between 60% and 400% worse performance. And this is not a case of the NAS not being able to provide decent NFS performance. When I connect to the NAS via NFS directly inside of the guest, I see dramatically better performance than when ESXi is connecting to the same NAS (the same share!) via NFS.

The ESXi configuration (e.g. network and network adapters) is 100% stock. There are no VLAN's in place, etc., and the ESXi system only has a

single Gigabit adapter. This is certainly not optimal, but it does not seem to me to be able to explain why a virtualized guest is able to get so much better NFS performance than ESXi itself to the same NAS. After all, they are both using the exact same sub-optimal network setup...

Thank you very much for your help. I would appreciate any insight or advice you might be able to give me.

mike_laspina · ‎04-12-2010

Hi All,

This is most definitely an O_Sync performance issue. It is well known that VMware NFS stores always use O_Sync for writes regardless of what the share has set for a default. As well VMware uses a custom file locking scheme so you really can not compare it to a normal NFS share connection from a different NFS client.

I have validated that performance will be good if you have a target storage with sufficient reliable battery backed or SSD cache.

Regards,

Mike

vExpert 2009

http://blog.laspina.ca/ vExpert 2009

View solution in original post

obstmassey · ‎04-09-2010

Please note that this is NOT related to NFS async connections. I have tested with both sync and async. The results recorded above are for sync. Here are the results for async. As you can see, writing to a guest-local disk on an NFS datastore gets half of the performance of a datastore on iSCSI or local disk.

-

~~root@iridium /~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 14.8515 seconds, 18.1 MB/s

real 0m20.576s

user 0m0.000s

sys 0m0.504s

~~root@iridium /~~# mount 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines /mnt

~~root@iridium /~~# cd /mnt

~~root@iridium mnt~~# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 7.19249 seconds, 37.3 MB/s

real 0m8.299s

user 0m0.001s

sys 0m0.332s

~~root@iridium mnt~~#

-

This is very comparable to the performance seen on the Iomega Ix4-200d; so much so that it made me ssh into the Iomega NAS and see how the NFS shares are exported. They are exported async!

In any case, the dramatic performance decrease (half of the transfer rate) is consistent across both the Iomega Ix4-200d as well as OpenFiler 2.3. This performance decrease is not seen when the exact same NFS export is mounted directly inside of a virtualized Linux guest...

Any ideas? Why does ESXi have such terrible NFS performance, when iSCSI and NFS direct to the guest~~both to the exact same NAS~~do not?

alubel · ‎04-10-2010

Yep, same boat here.. My take is that esx(i) has no idea about your storage (bbwc,ups etc) and enforces directIO no matter what over anything NAS (nfs,iscsi)

vmware esx4 on sun 4170 --> 10Gb nic --> nexus 5k --> 10gb nic --> sun 7110 = really crappy write performance when using very small write sizes. As write sizes increase, IOPS decrease and we see the 10gb bandwidth used, but with small writes, it can barely even saturate a 1gb link..

This came from trying to virtualize a sql server we had where they were doing 1 insert at a time x 1million inserts. on local adaptec (512mb bbwc 4 disk raid 10) storage we see 6 seconds per 10k rows but over no matter what NFS, we see about 30 seconds per 10k.

I've been wondering if I added some ssd's to my 7000 if this would change..

vmwarefc · ‎04-12-2010

Hi, Obstmassey,

It seems you are doing an interesting test. Today, I have also run an experiment against my test environment. However, I did not feel there are any outstanding differents on NFS shares' performance between two situations. Please point out if my test steps have any wrong. Thanks

The testbed I used is as follow.

NAS array: FAS3140 10G,

Guest OS: redhat 4, 32bit

Case 1,

I created a rhel4 VM on a NFS datastore, and then mount a NAS shares(we refer to it as second_share) from the NAS array.

on /mnt, do # sync;sync;sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024;sync;sync;sync;}

on /root/, do # sync;sync;sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024;sync;sync;sync;}

Case 2, mount the NAS share second_share to the ESXi (the rhel4 is on this ESXi),

on /mnt, do # sync;sync;sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024;sync;sync;sync;}

on /root/, do # sync;sync;sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024;sync;sync;sync;}

Zhifeng

obstmassey · ‎04-12-2010

I don't think this has anything to do with write size: I'm using a 1MB block size with DD...

I've done other research that seemed to point a finge at the Linux NFS server. So, I've re-tested with other operating systems providing the NAS. I'm using a P4 2.8GHz machine with 2GB of RAM and a single 7200RPM SATA drive. I'm not looking for maximum performance, but rather similar performance between ESXi and a guest Linux connecting via NFS.

First, FreeNAS 0.7.1.5113, which is based on FreeBSD. ESXi delivers MUCH WORSE performance with this than OpenFiler! Yet, the Linux guest running within ESXi gets very comparable results:

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 0.936529 seconds, 287 MB/s

real 2m40.427s

user 0m0.001s

sys 0m0.992s

# mount 172.28.19.17:/mnt/InternalHD /mnt

# cd /mnt

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 10.0552 seconds, 26.7 MB/s

real 0m10.922s

user 0m0.000s

sys 0m0.901s

#

That's right: creating a 256MB file on a VMDK provided by a FreeNAS datstore via NFS took nearly 3 MINUTES! (I ran it 3 times); yet the Linux guest still takes 11seconds. So, it seems that a Linux-based NAS is much better with ESXi than FreeBSD. So much for the "poor Linux NFS Server" being the source of the problem.

Next up: OpenSolaris. Will reply when the testing is done.

obstmassey · ‎04-12-2010

vmwarefc,

I'm not exactly sure what you did. What I'm doing is:

Create a Linux VM within a datastore provided by NFS
Inside of that VM, mount via NFS the exact same NFS share that the datastore is using
Run the dd command (which you got right) on a path provided by the "local" guest disk (/ is fine)
Run the exact samedd command on the path mounted from the NFS share
Compare the results (which you didn't give us!:)# )

I see that you're using a NetApp Filer. These devices don't seem to have the same performance penalty on ESX or ESXi that literally EVERY other NFS NAS device I've been able to test with has. What I want to know is why! And is there any other acceptable NFS NAS besides a NefFiler...

Thank you very much for your reply. I would love to know what the actual results you got with this testing if you don't mind sharing them.

kcucadmin · ‎04-12-2010

This testing reflects what we saw in a real world deployment of ESX4 vsphere to 8 hosts, using a mix of iSCSI and NFS to mount datastore to a EMC Cellerra NX4. I spent 2-3 months trying to hammer out performance issues on NFS stores. finally gave up and just went all iSCSI. no problems since.

I can tell you, it was not hardware related. I heard several "theroies" from EMC/VMware Support.

Conversation Issues in NFS in 3.5 that were suppose to be resolved in 4.0 that aren't. ESX host throttling NFS sessions. Buffers, cache, etc, etc, etc.

bottom line. iSCSI just performed much better for us(AT LOAD). I was willing to take a 5-10 or ever a 15-20% performance hit, but we saw much slower performance (AT LOAD). which is a shame cause we totaly lose "snaps/dedup/thin prov" all the reasons we selected the Cellera.

i will say that most of the problems dont show up till you start to put some real load on the system. at load, iSCSI is 3-4x better than NFS was.

mike_laspina · ‎04-12-2010

Hi All,

This is most definitely an O_Sync performance issue. It is well known that VMware NFS stores always use O_Sync for writes regardless of what the share has set for a default. As well VMware uses a custom file locking scheme so you really can not compare it to a normal NFS share connection from a different NFS client.

I have validated that performance will be good if you have a target storage with sufficient reliable battery backed or SSD cache.

Regards,

Mike

vExpert 2009

http://blog.laspina.ca/ vExpert 2009

kcucadmin · ‎04-12-2010

Hi All,
This is most definitely an O_Sync performance issue. It is well known that VMware NFS stores always use O_Sync for writes regardless of what the share has set for a default. As well VMware uses a custom file locking scheme so you really can not compare it to a normal NFS share connection from a different NFS client.
I have validated that performance will be good if you have a target storage with sufficient reliable battery backed or SSD cache.
Regards,
Mike
vExpert 2009

(edit: after reading your blog url, i think you are demostrating exactly what we saw)

with out getting into the NITTY GRITTY NFS has some issues that need to be resolved before we could use it in a production environment. since i didn't build my own NAS i doubt i can change the type of cache? or use the accelerator board or ZFS or ZIL. but yeah EMC was defiently smoking somthing the day they said SURE put 30-40 vms on this NFS NAS we spent over 45k on.

obstmassey · ‎04-12-2010

Does o_sync overwrite an NFS share flagged with async? I can't seem to find a concrete answer to this. According to my testing, it would seem that it does: even with async, I'm seeing terrible transfer rates.

I'm assuming that NetApp filers get the level of NFS performance it gets because it considers writing data to its battery-backed cache as being sufficient integrity to meet the o_sync requirement. I'm really surprised to hear that EMC devices via NFS fall into the same black hole as the rest of the NAS'es I've been testing!

It really does seem that if you don't have a NetApp device, you're really stuck with iSCSI. The big downside there is that you have to depend on ESX to provide all of the facilities for maintaining your storage: snapshots, backups, etc. I do not like that. I don't like having to deal with opaque block devices rather than a normal file system (such as provided by NFS).

Besides NetApp, are there any NFS servers that can provide performance within, say, 15% of that of iSCSI?

alubel · ‎04-12-2010

On my sun zfs backed 7000 series I'm getting better performance with NFS overall.

Reads are up to 700MB/sec with a 2k blocksize (sequential)

Writes are 350MB/sec with 2k (sequential)

Also, doing write tests with "dd" is not at all real; try vdbench, and thank Henk!

sync;sync;sync;

mike_laspina · ‎04-12-2010

obstmassey,

VMware's NFS client will mount the server share with noac and thus it forces sync mode even if async mode is set on at the server.

Netapp Filers will act upon a cached write by returning a completed write operation back to the VMware NFS client and thus the performance will be good. If the cache is full then your at physical disk speed once more and perf degrades significantly.

S7000 series Sun/Oracle Heads will perform extremely well as an NFS head and are a proving to be threat to Netapp sales. Many higher end EMC models will perform well but I would hazard a guess that the lower end models will fall short in many cases (nvram cache is costly).

If your ESX host has excess memory then be sure to set your NFS.maxshares to 256 as it will improve buffering and file lock cache resources.

vExpert 2009

http://blog.laspina.ca/ vExpert 2009

alubel · ‎04-12-2010

Yep - I cant wait to see what things will be like once we get some logzilla SSD's @ 18+gb a piece.

Im really not a big fan of netapp, I think they are overpriced commodity boxes with commodity hardware and a proprietary OS/filesystem that cant come close to the BUI+ZFS in the fishworks stuff If you look at what they use in terms of OSS/GPL etc, vs what they give back.. it's not very much, if anything at all! But now Sun is Oracle.. who knows what the future holds for Solaris and ZFS.

J1mbo · ‎04-13-2010

Bear in mind network file system is exactly that; the smallest addressable unit is 4KB or whatever has been configured. Since iSCSI is sector-based this is 512-bytes. Alignment throughout the various levels is absolutely critical to drive decent numbers from NFS based storage to avoid the read-update-write process at the ESX level, which has a devistating impact on write performance.

Please award points to any useful answer.

GTMK · ‎04-15-2010

Could anyone comment on the performance of Isilon systems compared to NetApp and Sun/Oracle?

Georg.

vmwarefc · ‎04-15-2010

Now I have no more time to work on this. I will update with their data upon I get time to do.

Thanks

Zhifeng

obstmassey · ‎04-17-2010

As promised, here's the results with OpenSolaris (0906 as-is). First, a note: the ESXi box was also copying some files at the same time, so the performance cannot be compared to the previous results. It doesn't really matter: all I'm interested in is the relative difference between VMDK and NFS performance. In short, it's as bad as Linux:

-

# mount 172.28.18.88:/export/home/tmassey /mnt

# sync; sync; sync; time { dd if=/dev/zero of=test.fil bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 64.9355 seconds, 4.1 MB/s

real 1m13.234s

user 0m0.000s

sys 0m0.490s

# cd mnt

# sync; sync; sync; time { dd if=/dev/zero of=test.fil bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 21.2836 seconds, 12.6 MB/s

real 0m23.407s

user 0m0.001s

sys 0m0.367s

#

-

As expected, NFS performance is terrible: approximately one quarter of what it should be.

All of these tests have been performed against a NAS with simple SATA drives. I'm going to repeat the OpenFiler test using a machine with four 10k drives in a RAID 10 configuration with an IBM ServeRAID 6M controller with battery-backed RAM. We'll see if it improves performance with NFS.

J1mbo · ‎04-18-2010

What are your mount and export options?

Please award points to any useful answer.

obstmassey · ‎04-18-2010

OK, test results with a battery-backed RAID controller:

IBM x326 Server with 3.5GB RAM, ServeRAID 6M with 128MB BBC, 4 x 36GB 10,000 RPM drives in RAID 10, 2.8GHz P4 Xeon.

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 4.48282 seconds, 59.9 MB/s

real 0m7.554s

user 0m0.001s

sys 0m0.488s

# mount 172.28.19.17:/mnt/InternalRAID10/shares/VirtualMachines /mnt

# cd mnt

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

268435456 bytes (268 MB) copied, 6.31535 seconds, 42.5 MB/s

real 0m6.658s

user 0m0.001s

sys 0m0.341s

#

Now that's more like it! So, NFS performance is only going to be improved by something to "overcome" (I guess more accurately, quickly satisfy) o_sync. So, let's see what happens if we try to overwhelm the relatively small cache on this older RAID controller with a 1GB dd:

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024; sync; sync; sync; }

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 35.7015 seconds, 30.1 MB/s

real 0m37.151s

user 0m0.004s

sys 0m1.888s

# cd mnt

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024; sync; sync; sync; }

1024+0 records in

1024+0 records out

1073741824 bytes (1.1 GB) copied, 27.9186 seconds, 38.5 MB/s

real 0m28.051s

user 0m0.000s

sys 0m1.300s

#

The performance difference is beginning to widen here. But if this doesn't demonstrate that even older decent drives and an older decent RAID controller make a big difference I don't know what does! For some comparison, here is the performance directly on the server:

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=256; sync; sync; sync; }

256+0 records in

256+0 records out

real 0m4.582s

user 0m0.016s

sys 0m0.912s

#

# cd /mnt/InternalRAID10/shares/VirtualMachines

# sync; sync; sync; time { dd if=/dev/zero of=test.txt bs=1M count=1024; sync; sync; sync; }

1024+0 records in

1024+0 records out

real 0m24.245s

user 0m0.016s

sys 0m3.300s

#

OK. The moral of the story: NFS only works with top-end storage processing, and fast drives don't hurt either.

obstmassey · ‎04-18-2010

Mount options are, in every case but one, the default. The only example was specified early in this thread: an OpenFiler test where the NFS mount was forced async. There was a performance boost, but the NFS datastore still only had half the performance of the native NFS. Which seems odd, if ESXi is forcing o_sync, I wouldn't have expected as much of an improvement.

However "improvement" is a relative term. Half the performance is better than a quarter, but still terrible.

And in the case of the real server with real drives and a real RAID controller, the mount was sync. For the record, here's the exact mount options:

/mnt/InternalRAID10/shares/VirtualMachines 172.28.16.0/255.255.252.0(rw,anonuid=99,anongid=99,secure,no_root_squash,wdelay,sync)

(I think I forgot to mention: I put OpenFiler 2.3 on the IBM x236.)

Unfortunately, mount options aren't enough to fix the poor NFS performance with ESXi and commodity NAS. And with the right hardware, the mount options don't overly matter.

All

Poor ESXi 4 NFS Datastore Performance with Various NAS Systems