VMware Cloud Community
bigdazza75
Enthusiast
Enthusiast

Is VSAN really as great for TCO as it first appears?

I'm looking at VSAN 6 as a possible way forward for us. I have a few concerns.... The TCO arguments for VSAN are strong on face value. But as I've got deeper something is bothering me - I'm not sure I'd be fully comfortable with FTT=1. The reason for this is simple maintenance... when you have a host failure or planned host maintenance you have a period of time where a single SSD disk failure would result in major data loss.... Unless in the planned scenario I migrate all the data. Even on a 10G cluster this could take a while, and this also vastly increases my maintenance window if I'm Rolling an update across the entire cluster. So.... Really I'd feel better to have FTT=2. That means (in my case) increasing my cluster from 4 hosts to 5, absorbing significant extra $$$ on hardware and licensing, reducing raw capacity efficiency to 33% (notwithstanding that I can actually only use 70% of this 33%), and the hole TCO support for my business case starts to look a little flaky (or at least flak-ier).

thoughts and input welcome.

Reply
0 Kudos
2 Replies
NuggetGTR
VMware Employee
VMware Employee

As you have mentioned When placing a host in maintenance mode you have the option of a full data copy removes your concern but adds additional time and overhead.

This really all comes down to the SLA and the importance of data, The great thing with SSD's is they generally dont just fail like spinning disks but degrade so a hard failure is allot less likely. But it is still a risk. If adding an additional host and disks makes it more expensive than buying an array with the same level of performance and better redundancy then that might be the better option for your case.

I look at it that an environment would have a higher risk of someone deleting a VM or accidentally blowing away the wrong LUN than the chance of a SSD failing in the 15 minutes a host is in maintenance for. its just about what level of risk is accepted for the data or systems running on the environment.

It is also worth mentioning this particular risk or concern also lessons with the more hosts that are in the cluster as there is more likely chance that while another failure would have an impact it would be less of an impact then if you only had 3 hosts as the chance of the secondary copy being on the failed host and the maintenance host is lessened with the increase in hosts numbers.

________________________________________ Blog: http://virtualiseme.net.au VCDX #201 Author of Mastering vRealize Operations Manager
Reply
0 Kudos
elgwhoppo
Hot Shot
Hot Shot

I think the real question you have is the one I had a long time ago; can update manager force a full data migration when placing hosts in maintenance mode. This would give you warm fuzzies for FTT=1 during an update window.

Also, with 5 servers...just how long are you thinking that your update process is going to take? Even if you skimp on networking and use 1Gb, you should still be able to update the entire cluster in less than 2 days with full evacuations for warm fuzzy feelings, it just wouldn't be automated, which to me is whatever for a cluster of that size. You usually have to plan at least 2 days for a vSphere upgrade even if you're using traditional storage solutions.

The TCO is proven IMO, even with the 70% capacity utilization recommendation. Work with someone to spec out an array with the same usable capacity, a storage switch, plus servers and compare the cost. Oddly enough, CFOs are usually the ones that love VSAN the most. It all depends on what you're comparing it to and what your requirements are, especially as it pertains to interoperability.

VCDX-Desktop
Reply
0 Kudos