rickolson2018
Contributor
Contributor

vSAN All-Flash - Happy with performance?

Hello, greetings and salutations.

I'm nearing 2 months into my new role, where I've inherited an 8-node VxRail all-flash vSAN cluster (ESXi 6.5, separate vCenter, 10G fiber networking to Cisco Nexus switches).  At the moment it is not in full production, but that will change in 3 weeks (whether we're ready or not).  I've got plenty of VMware experience, and I understand HCI, but this is my first hands on with vSAN so it's been a pretty fun few weeks for me.

I'm working through a few odd write-latency issues that seem to be reported by vCenter and the vSAN Health Service, but don't appear to be exhibiting any negative impact from the "users standpoint" while we're doing our tests.  Now I realize that each case is different - I've actually got an open case with Dell (who has roped in a VMware engineer) to assist in analyzing the data.

But I'm curious what users of a vSAN All-Flash array have to say.  Do you like the performance?  Do you have any instances of "good latency"?  Is there such a thing?  Do you ever see 'network latency' on your vSAN VMKernel Adapter (because I can sometimes see 4-6ms network latency when running HCI Bench tests, this on 10G fiber)?  Overall are you happy with what you're seeing out of all-flash vSAN clusters?

0 Kudos
3 Replies
Viperman
Contributor
Contributor

I have several customers running it in production with absolutely zero issues. 

0 Kudos
Wolken
Enthusiast
Enthusiast

I didn't check it personally, but it looks like this blog, which I have discovered recently, could shed some light on your question: Challenging 4-node VMware vSAN cluster performance

Darking
Enthusiast
Enthusiast

I am running an all-Nvme-Flash 8 node stretched cluster. Each hosts consists of 2 disk groups with 1 cache and 3 capacity disks.

We have been testing running HCIbench.

Due to the stretched cluster aspect I will be seeing higher latency than you might experience.

Anyhow running a random 70/30 up test i will be seeing around 220.000 read iops and around 70k writes at a latency of around 1.2 ms reads and 2.2 for writes.

I've also tested large seq. writes and capped our my 200gbit site interconnect.

So the speed is definitely there. In my setup I've found that a stripe size of 3 gives me maximum performance but that it does not vary much from the default raid-1 policy.

We have been testing without running Jumbo Frames, but will be switching to that now to bring down CPU utilization a bit