Hi everyone,
I have a standalone ESXi 5.1 host (not managed by vCenter) that I'm connecting to a Synology NAS server. I can mount the datastore successfully over the management network, but not over the separate subnet dedicated to the IP storage.
The management vmk0 port is on the 10.240.19.0/24 network.
The new VMkernel port I created for NFS (vmk1) is on 10.239.6.0/24, which is the subnet where our NAS resides.
I've double checked the switch ports on the Nexus 5Ks and verified all VLAN trunking is configured correctly both there and in ESXi. There's also no routing between vmk1 and the NAS - it's all layer 2.
The issue is that when I traceroute from my ESXi host to 10.239.6.11 (the IP address of the NAS), I'm seeing "traceroute: Warning: Multiple interfaces found; using 10.240.19.157 @ vmk0." So the traffic is going out the wrong VMkernel port. It should be going out vmk1 instead vmk0.
When I check the routes I can see vmk1 listed as the interface for any traffic destined for 10.239.6.0/24. So why is this happening? Is this a known bug?
~ # esxcfg-route -l
VMkernel Routes:
Network Netmask Gateway Interface
10.239.6.0 255.255.255.0 Local Subnet vmk1
10.240.19.0 255.255.255.0 Local Subnet vmk0
default 0.0.0.0 10.240.19.1 vmk0
Any help is greatly appreciated. I can provide more info as needed.
Is the vmk1 for NFS traffic on a seperate vSwitch? or is it in the same vSwitch as the management Network?
Also the network VMK1 is shouldn't be able to route to the other network. If both VMK's can route between each other ESXi doesn't know which VMK to use and it just pics one.
Hi JPM300,
The vmk1 port for NFS traffic is on a separate vSwitch than the management network. The management vmk0 port is on vSwitch0, while my vmk1 port for NFS is on vSwitch1 (along with a couple VM port groups for different data VLANs).
And no, there are no routes between these two networks. We didn't add any routes because we wanted the storage network to be isolated from management and only use Layer 2 to reach the ESXi host.
I appreciate the ideas. Are there any other configurations that may cause this? Is there any way (besides static routes) for me to say "when going to <this_IP>, always use <this_vmk>" ?
Thanks
Hmmm that is odd. I know NFS doesn't have a way to bind to a nic for failover as of yet like iSCSI, so if you want failover you have to put two nics in the vSwitch you use for the VMK for NFS and use standby on the other nic. iSCSI you can acutally bind the traffic to a VMK which helps with failover and MPIO. I also know with NFS if you are pointing to the share via IP make sure you always use IP, if you mount the NFS share via dns name make sure you always use DNS names. The reason for this is NFS creates the unique ID for the datatsore based off the connection name. So if you connect a NFS store by 192.168.2.5/NFSShare then disconnect it and reconnect it as NFSSERVER/NFSShare it will think its a new Datastore as it will generate it a new id.
Aside from that it sounds like you are doing everything right....... the only other thing I can think of is try and do a vmkping from the console on the host to the ip address /DNS name of the NFS target and see if you can ping it or see if you can do a tcpdump on the vmk traffic to see if you can figure out where the heck its going/comign from: VMware KB: Capturing a network trace in ESXi using Tech Support Mode or ESXi Shell
If you are using jumbo frames you can add the switch into add the frame size to make sure that is working. As far as I know ESXi grabs the lowest VMK available that can talk to the storage, so if you had VMK1 and VMK2 on the same network it would probably grab VMK1 and just run on its marry way which is why people just break it out onto another vSwitch / seperate network and go. I can't think of anything else off the top of my head but I will see what I can find.
~ # esxcfg-route -l
VMkernel Routes:
Network Netmask Gateway Interface
10.239.6.0 255.255.255.0 Local Subnet vmk1
10.240.19.0 255.255.255.0 Local Subnet vmk0
default 0.0.0.0 10.240.19.1 vmk0
You need to setup Static route in case if you want NFS traffic to flow through vmk1., Since the default gateway is the same as vmk0, it will pass through vmk0.
to enable static routing: Please refer this kb:
VMware KB: Configuring static routes for vmkernel ports on an ESXi host
I'd rather consider this a "display issue" with traceroute command than a network issue. With the networking setup properly - i.e. no routing configured - you wouldn't be able to reach the storage over the wrong vmkernel port. see also http://virtualpatel.blogspot.de/2013/02/traceroute-and-vmotion-traffic-on.html
André
Hi need2BaGeek,
Did you find out what was happening here? I am getting exactly the same problem.
Paul.
can you post ping and traceroute results to the storage? and the output of esxcfg-vswitch -l and esxcfg-route -l
Here's the result.
~ # esxcfg-vswitch -l
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch0 1536 6 128 1500 vmnic0,vmnic2
PortGroup Name VLAN ID Used Ports Uplinks
VLAN3 3 0 vmnic0,vmnic2
VLAN1 1 0 vmnic0,vmnic2
Management Network 1 1 vmnic0,vmnic2
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 1536 6 128 1500 vmnic1,vmnic3
PortGroup Name VLAN ID Used Ports Uplinks
VMkernel 0 1 vmnic1,vmnic3
~ # esxcfg-route -l
VMkernel Routes:
Network Netmask Gateway Interface
10.44.128.0 255.255.255.0 Local Subnet vmk0
10.44.129.0 255.255.255.0 Local Subnet vmk1
default 0.0.0.0 10.44.128.1 vmk0
~ # traceroute 10.44.129.64
traceroute: Warning: Multiple interfaces found; using 10.44.128.68 @ vmk0
traceroute to 10.44.129.64 (10.44.129.64), 30 hops max, 40 byte packets
1 * * *
traceroute: sendto: Host is down
2 traceroute: wrote 10.44.129.64 40 chars, ret=-1
~ # ping 10.44.129.64
PING 10.44.129.64 (10.44.129.64): 56 data bytes
--- 10.44.129.64 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
Cheers,
Paul.
My issue turned out to be network related. Our ESXi blades run in an IBM H Series Blade Center. The fiber from the blade center runs to a Cisco Nexus 7k switch. The switch ports on the 7k did not have the correct VLAN (for isolating our IP storage) trunked to the blade center. Once the VLAN was trunked, it started working as expected.
I hope you resolve your issue soon. From my experience, wouldn't hurt to double check all the networking from your ESXi box to your NAS. Verify on all network switch ports and VLAN tagging in ESXi.
There is no VLAN specified for the VMkernel on vSwitch, what vlan is your NFS storage on? Are you expecting to use native vlan on on the uplinks and what should that be?
I have the ports on the switch for NFS are untagged in VLAN 2. Could this be a problem?
Yeah, so if your storage is vlan 2 untagged (access ports) then you can either add vlan 2 as the native on the uplinks on the esxi hosts, or add vlan 2 to the trunk, if it is not already available on the trunk, and tag the vmkernel port.
It looks to be a networking problem and not a problem with vmware.
I jumped to the conculsion it was vmware when I saw the traceroute warning indicating it was send the data out using the wrong interface.
Thank you for you help. ![]()
I was experiencing a similar issue, and this is how I resolved it:
In F2 > Configure Management Network > Network Adapters, make sure the connected NIC list corresponds to the output of the "esxcli network nic list” command, where the vmnics with Link=Up are the only enabled network adapters. When different or additional network adapters are enabled, weird things happen.
