VMware Cloud Community
jondehen
Contributor
Contributor

Redundant Switches with NFS - Even Possible?

I am having a difficult time understanding how to setup 100% redundant networks from an ESXi host to multiple NFS storage devices.  Please see the attached image for this question.

Goal: Be able to lose any single ethernet link OR any single switch without interruption to the NFS datastore.

Thoughts:

  • I want to leave the switches UNstacked (so can't use LACP)
  • NASes supports ALB and static XOR
  • Scenario A is ideal, where the switches are totally independent and not connected
  • Using the default Route Based on Originating Virtual Port, traffic will only flow out of vmnic1 OR vmnic2, but not both at the same time (since I only have 1 vmkernal).  This means all NASes will use only vmnic1 or vmnic2 at any given time.
  • Both switches are Cisco 10G

Problems:

Not being able to use both vmnic1 and vmnic2 at the same time seems to be a big issue for this scenario.  The second hurdle is that VMware can't "see" when one of the two NAS links becomes disconnected.  It can only detect its link and the switch.

In scenario A, if all traffic is using vmnic1 and link 3 dies (but switch1 and link 1 are fine), then VMware will lose connection to NAS1 without failover, even though there is a physical path from NAS1 back to the host.

In secnario B, NAS vendor has recommended against linking the switches together (link 7) due to unknown switch behavior in the event of a link failure.  But I'm not seeing a way to avoid this...

Questions:

  1. How would scenario A be possible where I could lose links on the NASes and still have redundant connections? Or is link 7 really required?
  2. I don't believe adding a second VMK to the vSwitch and assigning each VMK to each vmnic would be beneficial in any way (because I believe VMware just picks one of the two VMKs only)
  3. I don't think I can use Beacon Probing because that will not detect NAS link disconnects (and really needs a different physical topology).  Right?  I don't think you can specify a list of targets to probe/test.
  4. Would a different load balancing algorithm be better than the default?  I would ideally like to team them without switch configuration such that I could use 2x of the bandwidth, but I don't see any options to allow that or prove advantages.
  5. Would NFS multi-pathing (included in 4.1) be a solution here?

Any insight into how to achieve the goal would be GREATLY appreciated!

Reply
0 Kudos
1 Reply
chriswahl
Virtuoso
Virtuoso

There's not much you can do in this scenario to address the problems. NFS is session based, so there is no way to distribute the data across multiple links without using NFS version 4.1 and the session trunking feature. In that case, multiple sessions are established from the host and logically treated as a singular connection. If one session dies or the filer becomes unavailable, the remaining, surviving sessions are maintained (assuming they can be reached via your other switch).

I rarely go through this sort of effort. Most filers support the creation of a VIP and advertise their VIP MAC address on the link of their choice. If link 3 were to die, the filer would detect the failure and migrate its VIP MAC to the other switch. Normal MAC table updates would inform the host to use the other vmnic.

Hope it helps.

VCDX #104 (DCV, NV) ஃ WahlNetwork.com ஃ @ChrisWahl ஃ Author, Networking for VMware Administrators
Reply
0 Kudos