VMware Cloud Community
Loc2262
Contributor
Contributor

Network performance issue: NFS datastore via Intel Gigabit NIC

Hello!

Please allow me to outline an odd performance issue I'm witnessing on a newly set up server virtualization for my working group at university.

We're using one PC (Core i5 on Intel mainboard) as a VMWare ESXi 5 host. That PC has one 500 GB HDD, only for booting and to store CD images.

A second PC (AMD platform) is used as a storage machine for VM HDD files. It's running Ubuntu Server 10.04 64-bit.

Both PCs are equipped with two Intel 82541PI gigabit ethernet NICs. One of them (eth0) is used to connect to the "public network", the other (eth1) is used for a dedicated, direct cable connection between the two PCs.

I have set up a software RAID-1 (mdadm) with 2 HDDs (2 TB each)  on the storage PC and exported it via NFS.

On the host PC, I'm using that NFS mount as a datastore. Jumbo frames (MTU 9000) are configured for the eth1 NICs.

Now comes the odd part. ESXi achieves only about 200 MBit network throughput on the NFS direct connection to the storage PC. I'm supposing that should be a bit more...

I can rule out the NIC itself, or the storage PC, as culprits:

  • When I transfer files from the outside to VMs running on the host, and those get cached in the VM's memory, I see 600+ MBit on the eth0 NIC. But the transferred data is stored at only ~200 MBit over eth1 to the NFS datastore.
  • Furthermore, when I transfer files directly to the storage PC from a physical machine, I also get around 600 MBit throughput.

So it seems that only when used for a datastore, the NFS / Intel NIC performs badly. Mmh, how come? And what can I do to improve the throughput?

I tried the following, to no effect:

  • Use iSCSI instead of NFS
  • Use ESXi 4.1 instead of 5
  • Turn off jumbo frames (I thought maybe the Intel NICs don't like that)
  • Use the eth0 switched network instead of the dedicated eth1 link

Any help, anyone? Smiley Happy  If you need more information, please let me know!

0 Kudos
3 Replies
Dave_Mishchenko
Immortal
Immortal

With a VM you can transfer at 600 Mbps over eth0 (vSwitch0?), but if the same VM communicates directly with the NFS server over eth1 (vSwitch1?) it can only get 200 Mbps.  Is that correct?

0 Kudos
Loc2262
Contributor
Contributor

That's half correct. Smiley Happy

vSwitch0 is used to have the VMs connect to the outside network. vSwitch1 is a direct cable between the host and the storage PC and only used for the NFS datastore.

I transfered a file from an outside physical machine to a VM (whose HDD file is located on the NFS datastore). Traffic comes in to the VM through vSwitch0 at about 600 MBit. I see that rate when the first few hundred MB are stored in the VM's filesystem cache, before and without causing actual datastore access.

The VM is not communicating directly over vSwitch1. But since its HDD file is on the NFS datastore, the test file gets transfered out from the host to the storage machine via vSwitch1, at only around 200 MBit.

So it would seem that when VMWare communicates over the Intel NICs for datastore-via-NFS/iSCSI purposes (Management Network), it achieves only  about 200 MBit, but using the same NIC as a VM Network, it yields 600+ MBit.

Is it maybe possible/required to inform VMWare that the NIC is used/reserved for storage network purposes? I'm thinking this might be some kind of deliberate behavior as not to fully load a NIC for management purposes, leaving no bandwidth for VM purposes?

Or are there maybe some settings/parameters in the vSphere client or the service console to tweak NFS / iSCSI / LAN-storage in general?

0 Kudos
Loc2262
Contributor
Contributor

I did some further tests.

1.

Transfer directly on the host with "dd". Write rate about 25 MB/s, read rate about 35 MB/s.

2.

I created two separate NFS exports on the storage PC, on different HDDs. Both go via separate NICs and IPs from storage to VM host, and are mounted there as separate datastores. Testing directly on the host with "dd" again.

Result: the sum of transfer rates on the two NICs is barely above 200 MBit.

WTF?? Is there a hard bandwidth limit in the NFS client or something? Why won't it go over 200 MBit if it has separate datastores, separate IPs, separate NICs and separate NFS exports, residing on separate HDDs on the storage machine??

I'm thoroughly stumped now.

0 Kudos