We are looking to implement a stretched vSAN environment with up to 30 ESXi hosts (15 per site) and I am wondering if it is possible to also implement a stretched vCenter High Availability (VCHA) 'cluster' across the same infrastructure?
Assuming it is... Is it possible/supported to use the same 'back end' witness networks for both vSAN and VCHA traffic or should they be kept separate from each other?
We will use vSphere 6.7 U2 for the ESXi hosts & the vCenter VCSA (Embedded PSC).
We will be using either VMware NSX or Cisco ACI for the VXLAN 'spanned' subnets between the two main sites.
The two main sites are approx 30 KM apart from each other. For the vSphere Management, vMotion & vSAN traffic we will dedicate two diverse dark fiber paths with an aggregate bandwidth of 80 Gbps (2 x 40 Gbps). The RTT latency over this link is sub 5 ms.
The third witness site is approx 40 KM from site A and approx 50 KM from site B and is connected to both via a 10 Gbps MPLS. Again, the RTT latency is sub 5 ms.
Any pointers/things to consider?
when you say "use the same 'back end' witness networks" are you referring to (1) using the same 10Gbps links for both witness networks or (2) using the same VXLAN for the vSAN Witness Host vSAN Network & the VCHA HA Network?
(1) Not an issue, they can use the same physical links
(2) I guess its technically possible but not sure why you would unless you have a specific reason?
Check this link:
We have the same stretched vSAN setup as you. We had VCHA running, but got rid of it after a couple of months. You have to ask yourself why you want it? What are you trying to achieve?
For us, it crashed many times, was out of sync a few times, and when it comes to upgrading you have to blow it away and reconfigure it each time. Everytime there was a failover, it was at least 5 minutes before it was available again. Totally worthless, when vCenter reboots in less than 30 seconds and is available again.
Maybe once it's out of "beta" mode in future versions we will look at it again.
Thank you all for your help guys! Sorry for not replying sooner, I took a day off for Christmas shopping!
T180985 - #2 - Yes there is a specific reason. Our network services group cross charge for new subnet/routing planning/due diligence, so the fewer new subnets a project needs, the better it is on my project budget!
depping - If you cannot get an answer, then us mere mortals have little chance! :-). Cheeky ask (as it is nearly Christmas) could you use your influence to reach out to Adam Eckerle and/or Emad Younis for an officialish answer?
AutoEng - I have read about the downsides (i.e. blow away for upgrades/etc), but our VCSA's take at least 10 minutes to boot and stabilise. I have never seen a VCSA boot to a usable state in 30 seconds - what is your secret? What I want is zero downtime on my vCenter platform, what I would be happy with is a <>60 second failover. Did you ever get an answer for the crash/sync loss? Were you pushing the limits of inter site bandwidth or latency?
After upgrading to 6.7 U3, and cleaning up the database due to another bug we ran into, it now boots really fast. Sure, the services take a bit longer to all start, but its available again in under 2 minutes at most.
Also, I would not run a new setup on 6.7 U2. We have run into so many bugs and problems, I have a list of 40+ so far, and have had over 70+ support tickets in 8 months. Make sure you start off with 6.7 U3!!!!!!!
I never got root cause analysis or the how's and why's out of VMware support, but that has been the case for most of the tickets. The crashes sometimes took out both the active and passive VCHA, never got an answer why. For the out of sync problem they could never tell me why other than to just blow it away and start again. The last time it crashed we just killed the VCHA setup and went standalone, it has been perfect since then.
Sometimes I get given outright bad information out of support that has caused us outages and further issues.
What level of support do are you planning on paying for? Production? Or one of the more advanced levels? The Production support is best effort and they are not obligated to give you a RCA.
We have a 40GB link as you do, with sub 5ms RTT, usually around 2ms.
Check these links regarding VCHA failover time: