PonF
Contributor
Contributor

Traffic flow with vMotion and vSAN between clusters?

I need help to understand how the network traffic goes when using vMotion together with vSAN.

We have two clusters that both have vSAN. There are two copies of the disk (and one witness) on the VM I migrate. How is the dataflow from the sender to the reciever? Is this described somewhere? I cant find any, the best would have been a picture...

I am doing this as troubleshooting since we have slow vMotion so I need to understand how the files goes. We use standard vSwitches and two 10Gbit ports for vMotion and should have 10+10Gbit on the physical network between the clusters. I have noticed the following network traffic in vmWare when migrating:

Sending cluster:
Host1 + Host2 that has the VM disk sends in 50MB/s on vSAN.
Host3 that runs the VM receives data in 50-100MB/s on vSAN.
Host3 that runs the VM sends in 50+50MB/s on vMotion.

Receiving cluster:
Host4 + Host5 that receives the VM disk receives in 100MB/s on vSAN.
Host6 that will be running the VM sends data in 150-200MB/s on vSAN.
Host6 that will be running the VM receives data in 50+50MB/s on vMotion.

Does this sound correct? Except that I only get 50+50MB/s (total of 1Gbit/s?). It takes 10 minutes to migrate and all values above have around these values during this time. The disk is setup to 80GB (thin provision). The summary page for the VM says "Storage usage 94.17GB".

Does Host1+2 send BOTH mirrored disks through Host3, first on vSAN somehow and then out on the vMotion ports, or why this vSAN traffic on all three nodes? And then the reverse on the other end? 

I did an iPerf test between two VM on each cluster and on the same network as vMotion and it says 5Gbits/s. That was with E1000 network card. I have not dared to test again with vmxnet yet because the hardware switches also handle the vSAN traffic. Maybe it can affect that...?

Labels (2)
Tags (1)
0 Kudos
5 Replies
Tibmeister
Expert
Expert

vSAN really doesn't change the behavior of vMotion; it will use the vmkernel interface it's enabled on.  One thing that I've noticed is that when you have more than one vmknic in a vSwitch, you will not want to have them all as active for vSAN and vMotion.  It can cause some odd behaviour.  It's much better to only have a single vmknic as active.  If you have a LAG configured, then that is your vmknic, so it would be active, but depending on the Load Balancing of the portgroup is not the best thing.  If you are using the same vSwitch for both vSAN and vMotion, alternate the active vmknic for each portgroup so each one has full access to the bandwidth without stomping on each other.

0 Kudos
PonF
Contributor
Contributor

I have two vmkernal for vMotion to a vSwitch that has 2 ports, one as active and one as standby.

vSAN has its own vSwitch and ports. It is just the physical switches between the clusters that is the same as vMotion.

As I decribed in my first post it looks like the traffic goes like in the picture on the link. Except that vMotion us actually 2 ports and not one:
https://imgur.com/a/eYrk338 

Is this correct? Does vMotion sends both copies of the disk (host1 and 2) through Host3 and then out on vMotion? The migration is from Host3 to Host6 and in these cases the data is automatic stored on Host1+2 and Host4+5.

0 Kudos
Tibmeister
Expert
Expert

Since Host3 is the compute host, then yes, all traffic will go out through Host3, meaning vSAN will send everything to Host3 then out to Host6, if you are doing a Storage vMotion, which from your diagram that's what it looks like because it looks like two different vSAN clusters.

0 Kudos
PonF
Contributor
Contributor

Okay thanks. As you say, I move a live machine between two vSAN clusters. Then it sounds as I understand the traffic flow correct then.

But why migrate both copies from Host1 and Host2 to the other cluster? Im thinking if you want to lower the data transfer in vMotion you could just move one copy and then replicate it on the other cluster instead? But perhaps that is not more secure of effective method? In my case I have the same vSAN policies on both cluster (dont know what would happen if they ware different). 

0 Kudos
Tibmeister
Expert
Expert

Not 100% sure but I will bet it's less "expensive" to copy the replica than to recreate it, but just spitballing there.

0 Kudos