1 2 Previous Next 28 Replies Latest reply on Sep 19, 2016 1:53 PM by darcidinovmw

    vSAN 6.0 LAB - SSD Write Cache performance

    COS Master

      Setup:

      3 DL360 G6 Servers 64GB RAM 2 X5670 Procs, P410i RAID Controller w/512MB cache

      Disk 1 900GB SAS 10KRPM as RAID 0 on SAS channel 1 (Controller cache enabled)

      Disk 2 900GB SAS 10KRPM as RAID 0 on SAS channel 1 (Controller cache enabled)

      Disk 3 120GB SATA SSD as RAID 0 (Total 240GB) on SAS channel 2 (Controller cache disabled)

       

      1 NIC for management 1Gb/s Full

       

      2NIC's for vMotion 1Gb/s Full

       

      2 NIC's for vSAN traffic 1Gb/s Full

       

      Most everything is working as expected.

      VM's on failed node get's re-spawned.

      VM's can vmotion from host to host.

      HA works.

      Read cache works.

       

      The "Flash read cache reservation" is set to 15% in the default vSAN policy.

      Not sure where the write cache setting is.

       

      But Write cache is really crappy slow, I mean compared to read, it's Glacially slow.

      When I run an SQLIO test for read, I get ~225MB/s and 3538 iops. that's expected right?

      In the read test, I can see the vSAN traffic on the NIC's ~119,000KBps.

       

      When I run an SQLIO test for write, I get 10.75MB/s and 172 iops, that's not good.

      In the write test, I can see the vSAN traffic on the NIC's ~ 181KBps.......Booooo, not good.

       

      I'm a little stumped as I thought the vSAN write cache would improve iops.........lol, or am I wrong?

       

      Thanks

        • 1. Re: vSAN 6.0 LAB - SSD Write Cache performance
          zdickinson Expert

          What is the make and model of the SSD and controller?  It would be interesting to destroy a disk group, create a VMFS direct on the SSD, move a VM there, and run the same test.  Thank you, Zach.

          • 2. Re: vSAN 6.0 LAB - SSD Write Cache performance
            COS Master

            Well.......lol

            This is all LAB so the controller the SSD's are on is the onboard P410i. There are 2 120GB SSD's in RAID 0 (I know, not supported....lol) and are SATA III and are either Patriot Blaze or Corsair. They are all paired and each pair is the same model, so there is a pair of Corsairs on one server, a pair of Patriot Blaze on another and the other server has Corsair pairs as well.

             

            I used ( VMware KB: Enabling the SSD option on SSD based disks/LUNs that are not detected as SSD by default ) to configure the RAID 0 of SSD's to be recognized and used as SSD's. It works and ESX see's them as SSD's.

             

            For sanity check, I actually did blow out one of the hosts and loaded Win 2012 R2 on it and did the same SQLIO test.

            I actually got twice the performance for reads ~400MB/s and writes were less at ~340MB/s. So to me the RAID sets work.

             

            For additional insanity check, I loaded ESX on the hardware, mounted the SSD LUN as a datastore, created a 2012 R2 VM on that datastore and ran the same SQLIO. I got slightly less in performance but still in the upper 300MB/s.

            So, again, to me the RAID sets function as designed.

             

            I originally thought the NIC's were maybe synching at 100Mb/s but I check them all and they are all 1000Mb/s and if that was the case, then the read iops would be slow too.

             

            Thanks

            • 3. Re: vSAN 6.0 LAB - SSD Write Cache performance
              zdickinson Expert

              Have you disabled all the caching on the controllers?  Or if it cannot be disabled, setting it to 100% read?  Thank you, Zach.

              • 4. Re: vSAN 6.0 LAB - SSD Write Cache performance
                COS Master

                caching is disabled on only the SSD LUN's.

                It still enabled for the 10k rpm spindle drives.

                 

                From experience, if I disable it on the spindle drives, my iops will suffer greatly.

                 

                But since i'm always in for the good 'ol college try, I will disable it on all the nodes and test it.

                 

                Thanks

                • 5. Re: vSAN 6.0 LAB - SSD Write Cache performance
                  COS Master

                  As I expected, i/o was slower.

                  Read went down to 295MB/s

                  Write went to 7MB/s

                   

                  Thanks

                  • 6. Re: vSAN 6.0 LAB - SSD Write Cache performance
                    zdickinson Expert

                    Gotcha, agreed disabling cache for the SSD RAID 0 would have been enough.  Guess I don't have more ideas.  Support ticket?  Thank you, Zach.

                    • 7. Re: vSAN 6.0 LAB - SSD Write Cache performance
                      COS Master

                      Just curious, what are you seeing in performance with your vSAN setup?

                       

                      Thanks

                      • 8. Re: vSAN 6.0 LAB - SSD Write Cache performance
                        zdickinson Expert

                        If we're going for throughput, 800-900 MB/s.  If we're going for IOPs, 50k-60k.  Dell 820s, 700 GB Micron PCIe SSD, LSI 9207-81, 1 TB Seagate, 10 Gb NICs, 10 Gb Dell switches.  Thank you, Zach.

                        • 9. Re: vSAN 6.0 LAB - SSD Write Cache performance
                          jonretting Enthusiast

                          You might want to take a look at VSAN Observer, and especially "esxtop". I am reasonably sure your SSD's are choking, and the 1Gbe VSAN nics are completely saturated. If your VSAN network is bottlenecked, VSAN has a tendency to miss most read cache hits in this situation. In my experience I haven't been able to get 1Gbe to be enough even in a LAB. To be completely honest I tried my damdest to get things to work on standard higher end consumer stuff and couldn't.

                          • 10. Re: vSAN 6.0 LAB - SSD Write Cache performance
                            jonretting Enthusiast

                            Totally agreed. Any 9207-8i, 10GB Nics, and PCIe (NVME) SSD, is the way to go. The 9207-8i with proper firmware/drivers is a rock.

                            • 11. Re: vSAN 6.0 LAB - SSD Write Cache performance
                              COS Master

                              I'm a little confused on how a set of SSD's in a RAID 0 can be slow at the rate of 7-10MB/s write but then be in the mid 300MB/s read.

                              If the NICs were saturated the reads would be just as slow as the write and that's not the case, unless I am misunderstanding something. If I am please educate me as I am still fairly new to vsan.

                               

                              I'm going to test each host and baseline SQLIO on the SSD LUNs on a native 2012 r2 windows install on the hardware and a 2012 r2 windows as a ESX VM on the SSD LUN datastore and a 2012 r2 windows on a VSAN datastore with 3 hosts contributing to the vsan datastore (rebuild).

                               

                              I'll post my results and may take a while.

                               

                              Unfortunately the hardware i'm using is what's available for proof of concept.

                              • 12. Re: vSAN 6.0 LAB - SSD Write Cache performance
                                jonretting Enthusiast

                                Are you running VSAN observer, or watching "esxtop" for each host? SSH into each host execute "esxtop" press "x" to show VSAN detail, press "n" to show netstack detail.

                                • 13. Re: vSAN 6.0 LAB - SSD Write Cache performance
                                  zdickinson Expert

                                  Well, every write will go to the SSD and then destaged to disk.  Are you filling up the write cache and it's choking on destaging to HD?  If you have strip width one, I could see those types of speeds trying to write to 1 spinning disk.  Only 30% of the cache is dedicated to write.  Thank you, Zach.

                                  • 14. Re: vSAN 6.0 LAB - SSD Write Cache performance
                                    jonretting Enthusiast

                                    Looking at your post again... I have a strong feeling esxtop will reveal very high latencies for you SATA SSDs. Switching to M.2 PCIe Samsungs SSDs would improve your situation, which probably could saturate your 1GBe NICs. This would also reduce the tax on your HBA. If all goes well you would see relatively consistent sustained IOs of 30Mbps - 120Mbps. However latencies would increase drastically under component re-syncing, and other background object operations; resulting in an unworkable experience of 10-20Mps + 300ms lat. In my experience messing with my lab hardware only SAS/NVMe Flash can mitigate this, which stabilizes client/com IOs at 120Mpbs. Here you might find yourself tinkering with Link Aggregation and various types of LACP load balancing. All of which in my experience do nothing to alleviate the 1GBe VSAN bottleneck between hosts.

                                     

                                    EDIT: If your set on your RAID 0 SATA SSDs, make sure disk cache is disabled and logical disk cache.

                                     

                                    You will also want to play around with various controller firmware, drivers, and SSD firmware. The effort could result in a slightly better experience, but nothing to write home about. IMHO

                                     

                                    Be well.

                                    1 2 Previous Next