VMware Cloud Community
need2BaGeek
Contributor
Contributor

NFS storage traffic going out vmk0 instead of new VMkernel port

Hi everyone,

I have a standalone ESXi 5.1 host (not managed by vCenter) that I'm connecting to a Synology NAS server.  I can mount the datastore successfully over the management network, but not over the separate subnet dedicated to the IP storage.

The management vmk0 port is on the 10.240.19.0/24 network.

The new VMkernel port I created for NFS (vmk1) is on 10.239.6.0/24, which is the subnet where our NAS resides.

I've double checked the switch ports on the Nexus 5Ks and verified all VLAN trunking is configured correctly both there and in ESXi.  There's also no routing between vmk1 and the NAS - it's all layer 2.

The issue is that when I traceroute from my ESXi host to 10.239.6.11 (the IP address of the NAS), I'm seeing "traceroute: Warning: Multiple interfaces found; using 10.240.19.157 @ vmk0."  So the traffic is going out the wrong VMkernel port.  It should be going out vmk1 instead vmk0. 

When I check the routes I can see vmk1 listed as the interface for any traffic destined for 10.239.6.0/24.  So why is this happening?  Is this a known bug? 

~ # esxcfg-route -l

VMkernel Routes:

Network          Netmask          Gateway          Interface

10.239.6.0       255.255.255.0    Local Subnet     vmk1

10.240.19.0      255.255.255.0    Local Subnet     vmk0

default          0.0.0.0          10.240.19.1      vmk0

Any help is greatly appreciated.  I can provide more info as needed.

Reply
0 Kudos
14 Replies
JPM300
Commander
Commander

Is the vmk1 for NFS traffic on a seperate vSwitch? or is it in the same vSwitch as the management Network?

Also the network VMK1 is shouldn't be able to route to the other network.  If both VMK's can route between each other ESXi doesn't know which VMK to use and it just pics one.

Reply
0 Kudos
need2BaGeek
Contributor
Contributor

Hi JPM300,

The vmk1 port for NFS traffic is on a separate vSwitch than the management network.  The management vmk0 port is on vSwitch0, while my vmk1 port for NFS is on vSwitch1 (along with a couple VM port groups for different data VLANs).

And no, there are no routes between these two networks.  We didn't add any routes because we wanted the storage network to be isolated from management and only use Layer 2 to reach the ESXi host.

I appreciate the ideas.  Are there any other configurations that may cause this?  Is there any way (besides static routes) for me to say "when going to <this_IP>, always use <this_vmk>" ?

Thanks

Reply
0 Kudos
JPM300
Commander
Commander

Hmmm that is odd.  I know NFS doesn't have a way to bind to a nic for failover as of yet like iSCSI, so if you want failover you have to put two nics in the vSwitch you use for the VMK for NFS and use standby on the other nic.  iSCSI you can acutally bind the traffic to a VMK which helps with failover and MPIO.   I also know with NFS if you are pointing to the share via IP make sure you always use IP, if you mount the NFS share via dns name make sure you always use DNS names.  The reason for this is NFS creates the unique ID for the datatsore based off the connection name.  So if you connect a NFS store by 192.168.2.5/NFSShare  then disconnect it and reconnect it as NFSSERVER/NFSShare it will think its a new Datastore as it will generate it a new id.

Aside from that it sounds like you are doing everything right.......  the only other thing I can think of is try and do a vmkping from the console on the host to the ip address /DNS name of the NFS target and see if you can ping it or see if you can do a tcpdump on the vmk traffic to see if you can figure out where the heck its going/comign from:  VMware KB: Capturing a network trace in ESXi using Tech Support Mode or ESXi Shell

If you are using jumbo frames you can add the switch into add the frame size to make sure that is working.  As far as I know ESXi grabs the lowest VMK available that can talk to the storage, so if you had VMK1 and VMK2 on the same network it would probably grab VMK1 and just run on its marry way which is why people just break it out onto another vSwitch / seperate network and go.    I can't think of anything else off the top of my head but I will see what I can find.

Reply
0 Kudos
zXi_Gamer
Virtuoso
Virtuoso

~ # esxcfg-route -l

VMkernel Routes:

Network          Netmask          Gateway          Interface

10.239.6.0       255.255.255.0    Local Subnet     vmk1

10.240.19.0      255.255.255.0    Local Subnet     vmk0

default          0.0.0.0          10.240.19.1      vmk0

You need to setup Static route in case if you want NFS traffic to flow through vmk1., Since the default gateway is the same as vmk0, it will pass through vmk0.

to enable static routing: Please refer this kb:

VMware KB: Configuring static routes for vmkernel ports on an ESXi host

Reply
0 Kudos
a_p_
Leadership
Leadership

I'd rather consider this a "display issue" with traceroute command than a network issue. With the networking setup properly - i.e. no routing configured - you wouldn't be able to reach the storage over the wrong vmkernel port. see also http://virtualpatel.blogspot.de/2013/02/traceroute-and-vmotion-traffic-on.html

André

Reply
0 Kudos
PK1234
Contributor
Contributor

Hi need2BaGeek,

Did you find out what was happening here? I am getting exactly the same problem.

Paul.

Reply
0 Kudos
vfk
Expert
Expert

can you post ping and traceroute results to the storage? and the output of esxcfg-vswitch -l and esxcfg-route -l

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
Reply
0 Kudos
PK1234
Contributor
Contributor

Here's the result.

~ # esxcfg-vswitch -l

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch0         1536        6           128               1500    vmnic0,vmnic2

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  VLAN3                 3        0           vmnic0,vmnic2

  VLAN1                 1        0           vmnic0,vmnic2

  Management Network    1        1           vmnic0,vmnic2

Switch Name      Num Ports   Used Ports  Configured Ports  MTU     Uplinks

vSwitch1         1536        6           128               1500    vmnic1,vmnic3

  PortGroup Name        VLAN ID  Used Ports  Uplinks

  VMkernel              0        1           vmnic1,vmnic3

~ # esxcfg-route -l

VMkernel Routes:

Network          Netmask          Gateway          Interface

10.44.128.0      255.255.255.0    Local Subnet     vmk0

10.44.129.0      255.255.255.0    Local Subnet     vmk1

default          0.0.0.0          10.44.128.1      vmk0


~ # traceroute 10.44.129.64

traceroute: Warning: Multiple interfaces found; using 10.44.128.68 @ vmk0

traceroute to 10.44.129.64 (10.44.129.64), 30 hops max, 40 byte packets

1  * * *

traceroute: sendto: Host is down

2 traceroute: wrote 10.44.129.64 40 chars, ret=-1

~ # ping 10.44.129.64

PING 10.44.129.64 (10.44.129.64): 56 data bytes

--- 10.44.129.64 ping statistics ---

3 packets transmitted, 0 packets received, 100% packet loss

Cheers,

Paul.

Reply
0 Kudos
need2BaGeek
Contributor
Contributor

My issue turned out to be network related.  Our ESXi blades run in an IBM H Series Blade Center.  The fiber from the blade center runs to a Cisco Nexus 7k switch.  The switch ports on the 7k did not have the correct VLAN (for isolating our IP storage) trunked to the blade center.  Once the VLAN was trunked, it started working as expected.

I hope you resolve your issue soon.  From my experience, wouldn't hurt to double check all the networking from your ESXi box to your NAS.  Verify on all network switch ports and VLAN tagging in ESXi.

Reply
0 Kudos
vfk
Expert
Expert

There is no VLAN specified for the VMkernel on vSwitch, what vlan is your NFS storage on?  Are you expecting to use  native vlan on on the uplinks and what should that be?

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
Reply
0 Kudos
PK1234
Contributor
Contributor

I have the ports on the switch for NFS are untagged in VLAN 2. Could this be a problem?

Reply
0 Kudos
vfk
Expert
Expert

Yeah, so if your storage is vlan 2 untagged (access ports) then you can either add vlan 2 as the native on the uplinks on the esxi hosts, or add vlan 2 to the trunk, if it is not already available on the trunk, and tag the vmkernel port.

--- If you found this or any other answer helpful, please consider the use of the Helpful or Correct buttons to award points. vfk Systems Manager / Technical Architect VCP5-DCV, VCAP5-DCA, vExpert, ITILv3, CCNA, MCP
Reply
0 Kudos
PK1234
Contributor
Contributor

It looks to be a networking problem and not a problem with vmware.

I jumped to the conculsion it was vmware when I saw the traceroute warning indicating it was send the data out using the wrong interface.

Thank you for you help. :slightly_smiling_face:

Reply
0 Kudos
admin
Immortal
Immortal

I was experiencing a similar issue, and this is how I resolved it:

In F2 > Configure Management Network > Network Adapters, make sure the connected NIC list corresponds to the output of the "esxcli network nic list” command, where the vmnics with Link=Up are the only enabled network adapters. When different or additional network adapters are enabled, weird things happen.

Reply
0 Kudos