Jumbo Frame is the devil and so is NFS.

maleitch · ‎05-07-2014

Experiencing poor NFS. VMWare blames EMC, and EMC blames VMWare. Ignoring that issue I thought maybe that I could enable jumbo frames to gain a little better performance. I have jumbo frames enabled all along the chain: at the SAN, at the switch, and on the vSwitch.

Here is my configuration:

I verified that jumbo frames were enabled on vSwitch and physical NICs on the ESXi host using:

esxcfg-nics -l

esxcfg-vswitch -l

Normal pings work fine from ESXi host to SAN and back. When I attempt to vmkping using anything higher than 1500 MTU (- header) and the -d option it fails with:

sendto failed, message too long

We then took twolaptops down to the switch, plugged them into ports configured with jumbo frames, enabled jumbo frames on the laptop and were able to ping each other, SAN, AND ESXi Host?

Now I am fearing that I have some routing issue with the vmkernel ports on my ESXI Host but the routing table seems correct, and network view of ESXTOP appear to show most traffic going over the correct vmk since all my datastores reside on the SAN:

50331652 vmnic9 - vSwitch1 2422.71 59.30 6694.41 31.18 0.00 0.00

50331653 Shadow of vmnic9 n/a vSwitch1 0.00 0.00 0.00 0.00 0.00 0.00

50331654 vmk1 vmnic9 vSwitch1 2422.71 59.30 3709.22 29.68 0.00 0.00

Routes:

Network Netmask Gateway Interface

10.20.20.0 255.255.255.0 Local Subnet vmk1

10.20.21.0 255.255.255.0 Local Subnet vmk2

10.30.1.0 255.255.255.0 Local Subnet vmk0

default 0.0.0.0 10.30.1.254 vmk0

I am now concerned that my NFS traffic might be going out the incorrect interface and have no clue why jumbo frames are failing with vmkping. If anyone has any ideas on what else I can try to narrow this down I would appreciate it.

MKguy · ‎05-08-2014

Going by the image you posted of your network layout, your routing table does indeed look fine to me. You can use the tcpdump-uw utility to capture traffic making sure your pings or NFS traffic are really routed through that interface like: tcpdump-uw -i vmk1 -nn

A couple of things off the top of my head:

- Have you rebooted the host after enbaling Jumbo frames? I've seen some people mentioning they had to reboot the host to activate jumbo frames for good.

- Make sure your NIC firmware and drivers are up to date.

- Post the vmkping command you're executing

- Instead of vmkping also try esxcli network diag ping --df --interface vmk1 --ipv4 --size 8972 --host 10.20.20.10

- Play a bit with payload sizes like 8000, 4000, 1500, 1472

-- http://alpacapowered.wordpress.com

maleitch · ‎05-08-2014

- Have you rebooted the host after enbaling Jumbo frames? I've seen some people mentioning they had to reboot the host to activate jumbo frames for good.

This is the second place I have read this, so I am going to try this next.

- Make sure your NIC firmware and drivers are up to date.

They are up to date as VMWare made me update as part of a previous, recent support call concerning NFS performance.

- Post the vmkping command you're executing

vmkping -s 8784 -d 10.20.20.10 <-- This generates sendto failed, message too long message

Removing the -d option and the ping goes through, but as the KB states the message is going through

but being fragmented.

- Instead of vmkping also try esxcli network diag ping --df --interface vmk1 --ipv4 --size 8972 --host 10.20.20.10

Will try this when I schedule maintenance for the reboot

- Play a bit with payload sizes like 8000, 4000, 1500, 1472

I tried all of those starting from 8000 down, and only 1472 worked which tells me something on the ESX side is still set to 1500.

Thanks for you input on this.

All

Jumbo Frame is the devil and so is NFS.