3 Replies Latest reply on Jun 16, 2020 2:12 AM by srodenburg

    multiple vmk ports for vSAN

    Luca82 Enthusiast

      Hi everyone, I have a question. In a "air-gap" configuration, as described in this link

      Advanced NIC Teaming | VMware® vSAN™ Network Design | VMware

      which network is actively used by vsan traffic? I well understood that vsan doesn't use both vmk ports, but I don't understand which one is used (if no failure is happened!)

      Thank you

        • 1. Re: multiple vmk ports for vSAN
          depping Champion
          VMware EmployeesUser Moderators

          You can check that by looking at the esxtop stats for those vmkernel interfaces. I can't tell you, as it probably will be different for every deployment.


          To be honest, we typically do not encourage customers to use this configuration! Why would you air-gap it to begin with? It just complicates things.

          • 2. Re: multiple vmk ports for vSAN
            Luca82 Enthusiast

            thank you duncan

            • 3. Re: multiple vmk ports for vSAN
              srodenburg Hot Shot

              Well.... I actually have a very compelling argument PRO having (and encouraging) more than one (two) vmk's for vSAN. And they have nothing to do with bandwidth or load-balancing (which people often hope will happen).

              No, my argument is purely one of redundancy and vSAN datastore availability. My experiences: it happened a couple of time already that when a switch was rebooted (firmware update for example), and that switch happened to be the one being actively used by the vSAN vmk, vSAN died completely for a while due to network partitioning.


              What happened several times is that a switch reboots, comes back up and although it's ports are "electrically speaking up", ESXi sees the link as up but there is no data flowing yet for a while. Most customers use two NIC's as uplinks so beacon-probing is not possible. Therefore, all they have is link-state detection and this is the stinker. The link is up, the fail-back takes place but packets are not being forwarded yet and this takes so long sometimes, that all hosts, not seeing the vSAN Datastore so to speak, freak out due to HA isolation (HA in vSAN uses the vSAN datastore) and start killing off and restarting all VM's. But the vSAN Datastore is not there yet so the carnage is complete.


              I therefore wanted to contact you Duncan to discus this because this is a serious problem. But I might as well discuss it here. The crux of the matter is that it's so unpredictable. One reboot and traffic starts flowing fast enough so ESXi / vSAN only sees a "hickup" and nothing happens. A next reboot kills the entire cluster because the switch did not start forwarding packets fast enough that time after it's ports where "up" and the (d)vSwitch triggers it's fall-back (hey the other switch is back, let's go back to that one -> nobody home -> all vSAN Nodes lose contact with each other long enough and crap out).


              If you want Duncan, see VMware ticket "SR 20126162005" as the folks at support did a good job finding out which "packet forwading after a reboot" was "fast enough" and which ones where "too slow to start forwarding packets" and thus leading to total network partitioning making HA jump 10 feet in the air.


              This is really nasty (Russian roulette) and I therefore always create a second vSAN vmk on the other uplink (other switch) so no matter what happens, no matter how the first vmk flips out over being failed-back to it's original active uplink (where's possibly nobody home yet), there is always that second vSAN vmk to talk with the other nodes. So the vSAN datastore is always available, somehow and HA does not freak out and starts its Isolation Response.


              If one can afford the luxury, use a 4 NIC uplink setup for vSAN because "more than 2" ports allow for beacon probing which will not fall for the "link is up but no packets being forwarded yet" trap (insert Admiral Ackbar "It's a trap" meme).


              There are possible issues with stretched clusters. Often, when the Witness connects over L3 directly to the first vSAN vmk due to static routes, if a second vmk for vSAN is created and from THAT network, there is no way to reach the witness, it won't work as Skyline Health will complain about the witness not being reachable via the second vmk. Going with vsan witness traffic separation over vmk0 (or vmk2 in case of VxRail) solves that. The witness might be briefly unreachable when vmk0 suffers the same fate as a vSAN vmk due to the "link up but no packet-forwarding yet" problem.


              The problem of "link up but no packet-forwarding yet" is mainly seen on very big complex switches. Big boy Core switches or UCS Fabric Interconnects etc. can takes ages to reboot and be slow to start forwarding packets even though their ports where "electrically up".


              Mail me or call me Duncan if you want to discuss this. I think VMware should encourage the use of dual vSAN vmk's each using "the other switch" as their active uplink (with another as the standby of course). This effectively solves this problem (or beacon probing). I think it should be part of the reference architecture in my humble opinion.


              Cheers en hou je taai,