Dear all,
We are having an issue in configuring properly a vDS in vSphere / vCenter Server 6.5 with regards to network failover in case of network loss for the VM.
Our environment:
The HA feature works properly:
What is not working is when the VM is running on one host, and we disconnect the network cable on that host for the particular NIC used in the vDS. What we expect is that the inbound/outbound network traffic will continue to flow transparently (flowing through the NIC of the other host which is part of the vDS). But here, when this happens, we cannot ping anymore the VM, and the VM cannot ping anymore the outside world.
Do you have any idea on how to fix this?
Is vDS the appropriate feature to handle this type of error (we assume that purpose of teaming on a vDS is to have traffic routed somehow from the other NICs still up in the vDS)?
Thank you very much.
Regards.
What is not working is when the VM is running on one host, and we disconnect the network cable on that host for the particular NIC used in the vDS. What we expect is that the inbound/outbound network traffic will continue to flow transparently (flowing through the NIC of the other host which is part of the vDS). But here, when this happens, we cannot ping anymore the VM, and the VM cannot ping anymore the outside world.
Yes, and that's correct behavior. If you have only a single physical NIC from each ESXi functioning as the uplink to the vDS, you don't have two uplinks, you have just one. When you disconnect that vmnic on host A, there is no way for traffic to be routed from host B over to host A to compensate. This isn't how a vDS is supposed to work. If you wish to have this type of protection against single vmnic failures, you must add a second vmnic per host to the vDS and team them.
Thank you for this quick reply.
We were assuming that may be the VM would be moved to the second host to recover the network eventually (using vMotion network or Mgmt network).
So apparently we need to add more uplinks to the vDS as you mentioned. This issue is, if all uplinks on the host are down, we will end up with same results (but the probability is much lower...): the VM will remain active, will not be moved to the other host, and the service provided by that VM will be unreachable.
My question then is: how can we automatically move a VM to another host if we lose all VM network links on the former host (apart vMotion and Mgmt networks)?
I was thinking about a component/script from within the VM, monitoring some external IP, and using VMware automation toolkit to report an issue and triggering Proactive HA. Is that one possible solution to this problem (but it requires few development I guess...)?
Do we have other more elegant solutions?
It is nice to have an HA feature restarting the VM when the server is down, but if nothing happens when all VM NICs are down, it answers only partially to the problem...
Best regards
My question then is: how can we automatically move a VM to another host if we lose all VM network links on the former host (apart vMotion and Mgmt networks)?
You don't, there isn't a scenario for this as an HA response because the VM itself is accessible. If you're really trying to guard against this, you must provide resiliency for your networking where the VM communicates. Plus, what's the likelihood that all links would be down for all VM traffic-related switches but they would be up for all kernel services? That's a pretty unlikely scenario.