VMware Cloud Community
alainap84
Contributor
Contributor

planning for a DR event with vSAN

Hi everyone, we are utilizing vSAN 6.x  for our production server environment. We have 5 host. We just had a major event where 2 disk needed to be replaced. We replaced those disk and then a resync of data kicked off. A few days later the resync was rebalancing 70TB worth of data. It brought the vSAN to a crawl while our host would become unresponsive as their disks became full. Got on the phone with support and they noticed we had our rebalance configuration set at 95%. They believe that configuration plus the replacing of two disk caused the problem. We have a plan of action to change the rebalance to their recommended  80% threshold when this is over with. We lost services to our mission critical servers for two days.

Now I need to plan for redundancy. What if this happened again? How redundant is vSAN? what if vSAN itself becomes corrupt? Thinking outside the box I realized we have no other storage array to move data to and that would have saved us in this event.

What do most people do for vSAN redundancy? what is recommended? What are the best options here? I am now planning for the future as I consider this a disaster situation. Any ideas and thoughts are appreciated!

3 Replies
tayfundeger
Hot Shot
Hot Shot

Hi,

You can use SRM or vSphere Replication in your VSAN infrastructure. You can replicate the virtual machines in your infrastructure to a different site. vSphere Replication is included in your current license. You also do not need to make a purchase.

Thanks.

--
Blog: https://www.tayfundeger.com
Twitter: https://www.twitter.com/tayfundeger

vBlogger, vExpert, Cisco Champions

Please, if this solution helped your problem, "Helpful" if it solves your problem "Correct Answer" to mark.
dbalcaraz
Expert
Expert

Hello alainap84,

A couple of ideas for your DR.

  • vSAN is as redundant as you want to configure it, of course, more disk groups with more disks will become in something more robust.
  • Depending on the corruption for vSAN, the data could be useless, I mean, if your RAID5/6/whatever fails, all your data couldn't be re-constructed, therefore, no data.
  • Best thing for redundancy is replication to another site/location, more disks per diskgroup, in order to increase resiliency and have a good VMware contract, is always a must!

Hope this helps.


Regards,

-------------------------------------------------------- "I greet each challenge with expectation"
Reply
0 Kudos
Techstarts
Expert
Expert

It must have been highly stressful time for you and your team. It is sad to hear and a bit difficult for me to imagine.

Few important points

  1. You need data protection strategy (what if vSAN itself becomes corrupt?)
  2. You need Business Continuity Plans for (Thinking outside the box I realized we have no other storage array to move data to and that would have saved us in this event.)
  3. How redundant vSAN is (you have FTT option to decide) and above 1, 2 points.

If I were you I will divide this goal into three parts

1. Things which you can do immediately

2. Things which you can do in 6 months

3. Last, in one year.

1. Easy Wins

  • Is your team properly trained on vSAN? Have they read vSAN operation guide
  • Is your monitoring for vSAN component properly configured? If not, configure it
  • Is your vSAN configured as per VMware recommendations, if not this is the time?

2. Short Term Fixes

  • Do you how much your Services/Applications/Servers/VM are Mission Critical?
  • Now attached the RTO and RPO requirement for it. If there are not in place, start majoring how early you restore your backup data

3. Long term Plan

  • Remote site (Availability zone, Region) defined by VMware Validated Architecture
    • The decision on vSAN Stretched cluster
    • Classical Approach.

I hope it helps if you wish to discuss more please reach me out.

With Great Regards,