natedev
Contributor
Contributor

Optimizing NFS performance with vSphere 5.1 using Oracle Sun ZFS Appliance

Hi all,

Would like to get input from members of the forum on optimizing NFS performance for vSphere 5.1. Performance has been good but I haven't found a lot of shared knowledge around Oracle Sun ZFS Appliances and vSphere 5.1 and would like to share what I've tried as well as hear what others are doing. Here's what I've got so far:

Networking

  • Brocade FastIron SX
  • Separate VLAN for NFS traffic
  • Jumbo Frames
  • Flow Control
  • STP disabled
  • LLDP enabled (wish the vNetwork Standard Switch would support this as the vNetwork Distributed Switch does)
  • CDP enabled (though the info doesn't seem to make it into vSphere despite "Both" setting - I just use this on the physical switch to confirm vSS configuration for the ESXi hosts since the info doesn't show up on the vSphere side)
  • 10 GbE ports for NFS, 1 GbE for storage appliance management
vSphere
  • vCenter Server 5.1b (947673), ESXi 5.1 (914609), vCloud Director 5.1.1.868405
  • vSphere Enterprise Plus licensing
  • vNetwork Distributed Switch 5.1 with 2 x 1 GbE uplinks per ESXi host configured for LACP Passive on vDS and Active on physical switch
  • Storage vDS is dedicated only for vmkernel traffic (VM networks are on separate vDS)
  • Each ESXi 5.1 host is configured to use the Storage vDS for vmkernel
  • All virtual machines and templates use 1 MB partition alignment offset
  • Windows virtual machines have 4k NTFS clusters
  • Mixed workloads - there are a few virtual machines with Microsoft SQL Server (extents are 8 x 8k) but the databases are not heavily utilized (more of a dev environment). Most virtual machines are J2EE app servers and client workstations used for testing.
  • Storage I/O Control enabled on each NFS-based Datastore
  • Direct I/O and Network I/O Control not possible with my ESXi hosts's physical adapters
  • Advanced Settings for each ESXi host: I don't really have more than 8 volumes so I didn't make any tweaks to NFS.MaxVolumes or to Net.TcpipHeapSize or Net.TcpipHeapMax.

Storage

  • Oracle Sun ZFS Storage Appliance 7320
  • Two head units, clustered
  • Each head unit has 2 x 10 GbE ports though not LACPd because each head unit needs one active and the other for failover
  • Management of each head unit is through 1 x GbE ports on unit
  • Two disk shelves - each have 2 x SSD log devices, one with 20 x 600 GB 15K SAS drives and the other with 20 x 3 TB SAS drives
  • One pool for each disk shelf
  • Each pool is using Mirrored data profile
  • Each head unit is active for one pool (active/active)
  • Update access time on read: unchecked
  • Non-blocking mandatory locking: unchecked
  • Data deduplication: unchecked
  • Data compression: LZJB (fastest)
  • Cache device usage: All data and meta-data
  • Synchronous write bias: Latency
  • Database record size: 128k
  • NFSv3 is the only protocol in use (no other traffic to appliances other than management)
  • Each NFS share is accessed by a different IP address to promote load sharing on the vSphere side
  • If I were to use iSCSI in the future, I would probably go with 8k as the Volume block size

I'd love to see Oracle add VAAI in the future as Nexenta has done. Not sure if that's on their road map.

Anyone doing anything differently or have thoughts on how to optimize Oracle Sun ZFS Appliance performance in vSphere environments? I've read all the best practice docs from Oracle but they're pretty dated (still focused on vSphere 4.x) and not as thorough as what you see from NetApp.

Thanks,

Nate

0 Kudos
5 Replies
abz01
Contributor
Contributor

Hi Nate,

We use Nexenta with vSphere 5.1, but with 10GbE hosts as well.

You probably should change the record size of your NFS volumes to 8 or 16 KB instead of 128KB if you need more IOPS instead of throughput. The VMware VMDK files are comparable to iSCSI volumes. If your VM's would also access NFS shares directly as well, you could/should set those shares back to 128KB.

For the SQL server you could create a separate NFS share/volumue with a 64KB record size for optimal performance. We use Oracle with Direct-NFS which is easier and faster because it uses the NFS shares directly, without a local VMware disk in between.

I would keep NFS as a datastore backing instead of iSCSI. NFS is much easier to setup and more space efficient. And I believe there is not much difference in performance.

Regards,

Dirk.

vogtmatt
Enthusiast
Enthusiast

Sounds pretty well setup. I agree with Dirk about the record size. We've found that 16k is a pretty good sweet spot as your I/O will be fairly random with your majority of VMs seem to be J2EE web servers. You could consider have one share with a higher record size for your higher throughput/sequencial I/O VMs and lower record size for everything else (mixed loads).

Also, depending on what your 2 zil devices are, you could overrun them with 10 mirrors with the 15k drives. I've seen it.

This is a good thread to start for ZFS/VMware in general, not even that specific to Oracle ZFS.

Cheers,

Matt

natedev
Contributor
Contributor

Thanks - it's great to get some real world feedback on the record size. I definitely want to lean towards higher IOPS vs. throughput. I've seen conflicting advice on the subject but I put greater stock in what people who use vSphere say vs. those using Nexenta or Oracle Sun ZFS Appliances for something else. Great point about SQL Server, too.

Thanks again!

0 Kudos
natedev
Contributor
Contributor

Thanks for this reply. 8k and 16k seemed a lot more reasonable to me than what we were using. It was actually someone on Oracle's forums who recommended the 128K record size (but they might not have been using it as an NFS datastore for vSphere).

After using HP Lefthand iSCSI for years, I'm definitely a bigger fan of NFS - especially for vCloud Director where you have a ton if virtual machines on the same datastore. I like the fact that I'm getting some major space savings going this route and I can use de-duplication if I like. In the past, I used VMware's Linked Clones in vCenter Lab Manager and found it to not perform well. As a result, I decided to not enable Fast Provisioning in vCloud Director since it's pretty much the same thing (though it might perform better now with VMFS5 but again, I'm a convert to NFS).

Regards,

Nate

0 Kudos
drick5200
Contributor
Contributor

Are you still using this setup? there was a NFS vaai plugin released for the 7320's back in September 2014. We have an almost identical setup that you have except we use iscsi instead of NFS.

0 Kudos