We just deployed an all flash vSAN cluster comprised of 4 Dell R640 ready nodes. Each node is comprised of:
2 Intel Xeon Gold 6246 @ 3.30 GHz
382 GB RAM
1 Intel Optane P4800x for cache
4 NVMe PM1725B for capacity
1 disk group per node
The vSAN traffic is running over a 25 GB core. Dedup and compression is disabled, as is encryption. We're using 6.7 U3. All firmware and drivers up to date. Storage policy is R1 FTT1.
I've deployed HCIBench and am currently running test workloads with it. The datastore is empty except for the HCIBench VM's. The Easy Run workload of 4K/70% Read/100% Random produced the following results:
I/O per Second: 189042.27 IO/S
Throughput: 738.00 MB/s
Read Latency: 1.48 ms
Write Latency: 1.15 ms
95th Percentile Read Latency: 3.00 ms
95th Percentile Write Latency: 2.00 ms
What should I be shooting for with regard to HCIBench results to be able to verify all is well and I can begin moving my production workload into vSAN? I'm currently testing the other 3 Easy Run workloads and can post any of those results if needed.
VMware has reviewed our cluster end to end and while originally they suspected there was a networking problem they eventually identified the bug.
@BB9193 Do you have a service request or bug number we can reference on the Dell or VMware side, I believe we might be having the same issue with our Dell all flash ready node *3 VSAN environment.
The environment is everything we expect it to be and more in terms of performance, except for single VM specific burst IO requirements.
VMware cannot find any issues and have confirmed the performance is what is expected, but when testing with HammerDB high burst IO on a single VM our performance is terrible.
From our testing on a single host with the same spec vs 3 node VSAN we are seeing a drop of minimum 70% TPM performance and in some cases higher.
While we expect a performance drop in performance single host vs HCI we do not believe the burst IO performance drops we are seeing are expected.
Test we are running
HammerDB - SQL TPC-C (Cloned VM's so identical)
1 Warehouse, 10 Users
Single host - 335000 TPM
VSAN - 95000 TPM
The Dell SR is 1044143886 and I believe the VMware SR is 20175210611. I also recommend running HCI Bench to get full benchmarks for your entire cluster.
Watch the performance graphs per VM during normal day to day operations. What we see are random latency spikes throughout the day on all VM's running on vSAN. They typically range anywhere from 5 ms to 30 ms. We actually saw a spike of 130 ms recently.
This latency issue was totally unexpected for us as we anticipated all flash with Optane to have extremely low latency. Dell was caught off guard as well.
Install a free appliance called "SexiGraf" and configure it to talk to your vCenter. After an hour or so, you can select dashboards for the various vSAN types and find out on what latency on which level is affecting you. Disk, Client or Owner. Very good tool for this sort of thing and easier to use than VMware's own IOINSIGHT. It also runs all the time so you can look at historical data.
Everything you need to download and all Infos are on this Quickstart page.
@TomIvone You guys make any progress on this?
Hello
That explains why I can't reproduce huge latency spike I see on my all flash vsan cluster using ioanalyzer, I use small disks on these appliances and get good results...
But on the other hand, production VMs running sustained write stream show very poor performances :
(this is a pftt1 sftt 1 raid 1 VM)
On my side I got no success with vmware support (I got tired to run tests and debug on our production environment while the issue is undeniable).
My personal feeling is vsan is not working well with big io size (>256KB) but your issue is with 4KB io size...
Following this thread for news.
I would be happy to bench and share results here to compare performances and help, if you tell me which benchs you'd like to run.
Our current setup is stretched (I can run tests on non stretched storage policy) 7+7 all flash, each node has 3 DGs (6+1, 5+1, 5+1), 25GBps ethernet, dell switchs S5248F-ON
My HCI Bench test results and parameters are posted early on in this thread if you want to try any of those for comparison.
I'm told this issue(s) will possibly be addressed in 7.0 U3.
Can you confirm me the bench target that you want me to test ?
4K 100% Write 100% Random - 7 workers each one on different ESXi
pftt0 sftt1 - raid 1 (non stretched)
Here are my results on ioanalyzer :
Write latency about 5ms
Same test but 100% read instead of 100% write :
Read latency about 0.5ms
Please note, unlike you, I have no nvme, only regular sas ssds, write intensive for cache.
My ioanalyzer is pretty old and doesnt generate graphs or latencies anymore, I can setup HCIBench if needed.
We were unable to 100% prove this was our issue and we determined that the workload requirement was unsuitable to VSAN, test within a Dell lab using Optane based storage were unable to get the performance we require in a single VM use case.
At this stage we are still working through if we keep VSAN as it meets 99% of our requirements and use a single host + replication for the 1%, other option would be to ditch VSAN and go back to the SAN + Raid10 model.
@Sharantyr3 Yes, you would need to use HCI Bench. Here are some of my results from previous runs:
My 4K 100% Read 100% Random results are:
Number of VMs: 8
I/O per Second: 330801.05 IO/S
Throughput: 1292.00 MB/s
Read Latency: 0.82 ms
Write Latency: 0.00 ms
95th Percentile Read Latency: 1.00 ms
95th Percentile Write Latency: 0.00 ms
Here are my 4K 100% Write 100% Random results:
Number of VMs: 8
I/O per Second: 104066.28 IO/S
Throughput: 406.00 MB/s
Read Latency: 0.00 ms
Write Latency: 2.63 ms
95th Percentile Read Latency: 0.00 ms
95th Percentile Write Latency: 8.00 ms
@TomIvone What size was the vmdk on the test VM? vSAN has throughput limitations per vmdk.
Supposedly they have made decent performance improvements in the later versions of 7.x. I'm hoping there will be a business stable release of 7.x ready for later this year.
120GB and 100GB.
My requirement for one VM in this environments is our main issue.
It must be able to using HammerDB and MS SQL reach 200,000 TPM using the following.
Number of warehouses: 1
Virtual users to build schema: 1
Virtual users: 10
@TomIvone We are also disappointed with our SQL performance in vSAN.
Hello,
Tried hcibench but the pressure on our vsan cluster was too high, first time I see "congestions" counter raising up. I had to abort testing when vsan performance service was hung and alarms about host communication problem with vcenter were happening.
Also you didn't mention which hcibench u did run (easy run or custom?). I choose custom as I don't like "auto" things run on their own but maybe 7 VMs with each one 4 disks was too much.
How many VMs per ESXi, and how many disks of which size per test VM did you run ?
Also what "Working-Set Percentage" did you chose ?
I can run new tests on off hours.
edit, bench with reduced load on vsan (by reducing number of vmdk per vm to 1) :
Case Name | Job Name | Number of VMs | Number of VMs Finished Early | IOPS | Throughput(MB) | Read Latency(ms) | Write Latency(ms) | Read 95tile Latency(ms) | Write 95tile Latency(ms) | Blocksize | Read Percentage | Total Outstanding IO | Physical CPU Usage | Physical Memory Usage | vSAN CPU Usage |
fio-1vmdk-100ws-4k-0rdpct-100randompct-2threads-1617721835 | job0 | 7 | 0 | 12393,06 | 48 | 0 | 1,13 | 0 | 1 | 4K | 0% | 14 | 0.0% | 44.93% | 0.0% |
fio-1vmdk-100ws-4k-100rdpct-100randompct-2threads-1617722455 | job0 | 7 | 0 | 38420,11 | 150 | 0,37 | 0 | 0 | 0 | 4K | 100% | 14 | N% | N% | |
fio-1vmdk-100ws-512k-0rdpct-100randompct-2threads-1617723057 | job0 | 7 | 0 | 4067,45 | 2033 | 0 | 3,65 | 0 | 4 | 512K | 0% | 14 | 0.0% | 44.93% | 0.0% |
fio-1vmdk-100ws-512k-100rdpct-100randompct-2threads-1617723569 | job0 | 7 | 0 | 8413,18 | 4206 | 1,73 | 0 | 3 | 0 | 512K | 100% | 14 | 0.0% | 44.93% | 0.0% |
working set % : 100
1 VM / ESXi, 1 disk 10GB 2 threads / VM
7 VMs total, raid 1
Weird, even reducing load on vsan, the vsan performance service seems to get hammered by hcibench and gets unresponsive while running benchs (no more performance graphs on vcenter for vsan tab).
I need your exact tests specifications to run the same here for comparison
edit 2
Case Name | Job Name | Number of VMs | Number of VMs Finished Early | IOPS | Throughput(MB) | Read Latency(ms) | Write Latency(ms) | Read 95tile Latency(ms) | Write 95tile Latency(ms) | Blocksize | Read Percentage | Total Outstanding IO | Physical CPU Usage | Physical Memory Usage | vSAN CPU Usage |
fio-4vmdk-100ws-4k-0rdpct-100randompct-2threads-1617737533 | job0 | 7 | 0 | 49954,22 | 195 | 0 | 1,12 | 0 | 1 | 4K | 0% | 56 | N% | N% | |
fio-4vmdk-100ws-4k-100rdpct-100randompct-2threads-1617738028 | job0 | 7 | 0 | 146658,57 | 572 | 0,38 | 0 | 0 | 0 | 4K | 100% | 56 | 15.69% | 45.0% | 1.61% |
fio-4vmdk-100ws-512k-0rdpct-100randompct-2threads-1617739020 | job0 | 7 | 0 | 10684,99 | 5342 | 0 | 5,34 | 0 | 9 | 512K | 0% | 56 | N% | N% | |
fio-4vmdk-100ws-512k-100rdpct-100randompct-2threads-1617739833 | job0 | 7 | 0 | 26186,92 | 13093 | 2,16 | 0 | 3 | 0 | 512K | 100% | 56 | 0.0% | 45.0% | 0.0% |
working set % : 100
1 VM / ESXi, 4 disks 100GB 2 threads / VM
7 VMs total, raid 1
Looking my numbers I find the difference % ratio between read and writes iops quite the same as you (you got faster but less cache disks)
What numbers would you be expecting (pre-sale vendor said) ?
I too am finding some unexpectedly "average" performance on a 5-node all-flash vSAN cluster I'm working on for a client (vCenter 6.7), vSAN on-disk format version = 10.
The environment private cloud so I don't have full visibility of the back end network, however I do see that the hardware specs of the host servers are very good and modern.
I use a combination of tools to assess performance. HCIbench for cluster level performance, but for 'real-word' performance test I use simple SSD read/write utilities on a Windows VM guest. I won't go into the exact numbers, but basically the VM guest performance on vSAN performs worse than it does on VMware workstation, installed on my laptop with a single consumer grade SSD. The performance is slightly better than my testing lab server, which has 8x Samsung Evo 850's in RAID10 on an old LSI MegaRAID 9261-8i controller.
I have a case open with VMware and the private cloud vendor, so we are trying to work out what is causing the poor write performance and high latency spikes.
For me it was because on VSAN nics I was not using lacp and the 2 ports was configured as active/active and not active/passive and Aruba switch was awful as intraswitch performance.
Same here, and I tell you why : benchmark use small to average io size : good perfs on vsan.
When you use file copy or any real world use case, like database dump, etc, you may get poor performances (by poor I mean not what you would expect from SSDs). I write "may" because it depends on the OS, filesystem etc, but in the end, if your io size are > 1MB, performances are poor.
Everyone here will just say to you, "file copy is not a real benchmark" end of discussion.
But as you, I think it may not be a benchmark, but it's real world use case, and performances are not ok.
And I can't find any bottleneck in the chain, neither vmware support could, so I gave up on my support request.
Overall, servers works just fine because 90% of iops are small io. Just don't be surprised that when you do a filecopy you get poor performances, it's by design.
We had an escalation ticket open about this for months and support eventually confirmed that with vSAN 6.7 there are low throughput limits per vmdk. They actually suggested we break up all vmdk's to no more than 50 GB each to get around these limits, which obviously is ridiculous.
If you have tickets open, ask support about the vmdk throughput limit as I don't recall the specifics.
Hi,
using a single file copy job on any shared storage array will always result in lower performance.
This is by design, as any shared storage is designed to get accessed by multiple hosts (different OS, different applications, different IO profiles, multiple threads).
So IF your used case requires high single thread IO performance it would require tuning on the VM side.
Here're some recommendations to increase windows single thread IO performance.
This should increase performance of those single thread applications.
Just a side note.
Windows explorer file copy tasks uses different IO Sizes for read and writes, but both are larger than 64KB per IO.
vSAN is optimized for 64kb IOs.
vSAN has to split larger IOs into smaller chuncks, and such activity would increase the latency of the IO.
And when using only a single/few vmdks with default SPBM stripe setting of 1 you might end working with a single cache device instead of spreading the load on multiple cache devices.
But I would be really interested in a reference/kb article regarding vmdk throughput limit.
Hello,
What you are writing is partialy right. Have you ever tested file copy on low end array ?
I get more throughput using a 60 magnetic disks in sequential read / write than in a full flash vsan cluster.
But it's as you said, the vsan is here to serve many "customers" not only my benchmark, whereas the low end array will go all in for me.
I think the main problem here is to accept the fact that, in a vsan cluster, you may get less performances on one job with hundred of flash disks working than you would get on a single direct attachment flash disk (or even local raid 5 raid card with 3 flash disks). It's disappointing but it's by design.
And it's good to know it - one VM can't compromise the whole vsan storage cluster.
What's the most frustrating I think is no bottleneck is seen in the chain.
Check data disks : latency ok
Check cache disks : latency ok and not filled up
Check network cards : way below full bandwitdh
Check switchs : underutilized
Check vsan : no bottleneck in graphs
Also some graphs doesn't make sense to me, and that is frustrating too.
But I believe the global vsan graphs are not accurate : it's related to our discussion. The vsan graphs show an average performance of VMs doing IOps.
And if at a specific point of time, only one VM is doing high IOPS : this VM will make the graphs "averages" go crazy.
Typical exemple : the backups.
Look at this (I removed writes as they are not spiking and it's easier to read and get my point) :
We see huge spikes in read latency while iops don't raise, but throughput do raise. So it's the sign of increasing IO size incoming on the vsan.
It's typical of backup jobs, reading big chunks.
You can see later an iops read spike while latency don't increase this much.
At first I thought there was a problem of latency in my vsan, but in fact there is no problem : just one VM doing a lot of stuff and result in "false average" latency graph.
I believe that, if during this backup time, I had many VMs doing "normal" iops (at least 3 or 4 more than the backup job), my graphs would show "normal" latency, not such spikes.
I don't know how this could be fixed.