VMware Cloud Community
vm7user
Enthusiast
Enthusiast
Jump to solution

Questions about all-flash vSAN

Hello,

I have a few questions about all-flash vsan (all nvme ssd are the same!):

1) Is it really necessary to use the write cache (as a separate ssd)? Is it really impossible to create a vsan without a write cache disk?

2) (if write cache is mandatory) - for example, I have four 8TB ssd on the host. Will I have to allocate 1TB for the write cache, and can I use the remaining 7TB for the datastore?

3) what's the point in a separate ssd for write cache if all disks are the same? why can't write directly to ssd?

Reply
0 Kudos
1 Solution

Accepted Solutions
TheBobkin
Champion
Champion
Jump to solution

@vm7user 

1. Yes it is necessary to use a whole, unpartitioned, All-Flash Cache-tier certified device as Cache-tier (advisable to validate they are on the vSAN HCL and certified for that purpose before purchasing anything). There is no Disk-Group without exactly one 1 Cache-tier SSD/NVMe + 1-7 Capacity-tier devices.

2. No, a whole device is needed - using an 8TB device for this is not a good use of resources (as write-buffer in current versions of vSAN will only actively use max 600GB), better off using something smaller and faster (e.g. a 600-800GB write-intensive NVMe over a possibly worse performing 4-8TB read-intensive SSD/NVMe (also, bear in mind when I say "worse performing" I mean like for like e.g. a 8TB device using only 600GB isn't going to get the full device performance)).

3. Because this is how vSAN architecture has been designed. The intention is to use smaller, relatively faster devices as Cache-tier and then larger, less write-intensive devices as Capacity-tier.

 

Here is a good basic overview of how Disk-Groups work:

Understanding vSAN Architecture: Disk Groups 

View solution in original post

Reply
0 Kudos
8 Replies
TheBobkin
Champion
Champion
Jump to solution

@vm7user 

1. Yes it is necessary to use a whole, unpartitioned, All-Flash Cache-tier certified device as Cache-tier (advisable to validate they are on the vSAN HCL and certified for that purpose before purchasing anything). There is no Disk-Group without exactly one 1 Cache-tier SSD/NVMe + 1-7 Capacity-tier devices.

2. No, a whole device is needed - using an 8TB device for this is not a good use of resources (as write-buffer in current versions of vSAN will only actively use max 600GB), better off using something smaller and faster (e.g. a 600-800GB write-intensive NVMe over a possibly worse performing 4-8TB read-intensive SSD/NVMe (also, bear in mind when I say "worse performing" I mean like for like e.g. a 8TB device using only 600GB isn't going to get the full device performance)).

3. Because this is how vSAN architecture has been designed. The intention is to use smaller, relatively faster devices as Cache-tier and then larger, less write-intensive devices as Capacity-tier.

 

Here is a good basic overview of how Disk-Groups work:

Understanding vSAN Architecture: Disk Groups 

Reply
0 Kudos
vm7user
Enthusiast
Enthusiast
Jump to solution

@TheBobkin 

better off using something smaller and faster (e.g. a 600-800GB write-intensive NVMe over a possibly worse performing 4-8TB read-intensive SSD/NVMe
Because this is how vSAN architecture has been designed. The intention is to use smaller, relatively faster devices as Cache-tier and then larger, less write-intensive devices as Capacity-tier.

 

But using read-intensive SSD in Capacity-tier is just one of many possible scenarios, I can have all mixed-use SSD or all write-intensive SSD, and I don't understand why I need a separate SSD for the cache in this case.

In my opinion, you should let users decide for themselves whether to use the SSD for cache or not.

 

Reply
0 Kudos
vm7user
Enthusiast
Enthusiast
Jump to solution

and one more question:

when using a disk dedicated for write cache, do all writes occur in write-back mode? that is, as soon as a write occurs to this disk, the system reports that the write is completed, without waiting until the write is completed on the second host?

In this case, I understand the need for a separate disk for the write cache.

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@vm7user You have the understanding half-right - IOs are committed once they hit the Cache-tier devices (e.g. they don't need to be physically written to the Capacity-tier devices to be committed) but they are not committed (in DOM) until they have hit all Cache-tier devices in all Disk-Groups where the data-components reside and acknowledgement of this received by the DOM-Owner of the Object (as otherwise it wouldn't really be synchronous-replication and thus not truly have redundancy).

This is the main point of pretty much any Cache+Capacity architecture e.g. use a faster buffer to absorb and commit IOs as fast as possible (and other benefits e.g. being able to keep these data in faster buffer space if they are just going to be re-modified over and over before eventually being destaged to Capacity-tier).

Reply
0 Kudos
vm7user
Enthusiast
Enthusiast
Jump to solution

I am very disappointed with this solution.
Put a dedicated disk for write caching, but using it in Write-through mode is a very dubious solution. Waiting for writing through the network and tcp/ip stack greatly reduces performance.

Reply
0 Kudos
TheBobkin
Champion
Champion
Jump to solution

@vm7user  if you don't like how vSAN architecture works and/or the requirements for this to work then feel free not to use it - no-one is forcing you to use it.

 

Going as fast as feasibly possible isn't the only/main focus of every storage solution, most people care about their data being resilient - I would advise that perhaps you should do some research on the trade-offs of faster vs redundant/crash-consistent storage designs e.g. ACID vs BASE, strict-consistency vs eventual-consistency, whether one or the other suits your workload is up to you.

Reply
0 Kudos
Tibmeister
Expert
Expert
Jump to solution

Random reads on a SSD across the network are not always the fastest, so the cache tier will absorb some of that.  Now one can argue that having a portion of the capacity disk(s) used for this, but cache disks that are SSD get utterly destroyed and fail much more often than data disk(s).  Having the cache on a separate device protects the data disks from issues when the memory cells start going bad.  In SSD's, as the memory cells go bad, just like with HDD's, you loose capacity.

In every HCI solution that I've seen that uses a network interconnect, a separate cache disk is absolutely needed.  This really isn't unique to vSAN.  Additionally, because writes to the SSD's are fast, a portion of the host RAM is used for write-caching, instead of using a much slower disk, which is why vSAN (and other HCI solutions) are memory intensive.  If they used RAM for write-cache on a hybrid cluster, you would start starving the VMs, your workload, of resources, so using the cache disk for writes is needed in that case.

It's really an eloquent and resilient solution, and as with any such solutions, there's specific requirements in order to ensure that the data is intact, available, and rapidly ready.

Reply
0 Kudos
vm7user
Enthusiast
Enthusiast
Jump to solution

>>Random reads on a SSD across the network are not always the fastest, so the cache tier will absorb some of that.

In all-flash configurations, vSAN uses the cache layer for write caching only.

 

>>Going as fast as feasibly possible isn't the only/main focus of every storage solution, most people care about their data being resilient

I still don't understand why in my configuration, where all disks are intel optane, I must add one more disk for write cache?

Reply
0 Kudos