9 Replies Latest reply on Jun 18, 2020 2:36 AM by SanderHogewerf

    Random Writes low Throughput

    SanderHogewerf Novice

      Hi,

       

      We just implemented a new vSan cluster (6.7) based on vxrail. It is an all flash cluster with NVME cache disks.

      Now i see a Random Read of 4 times as high as the Random Write speed.

       

      My first question is, is this normal?

      Second question, are there some standard steps to increase the Random Write Throughput?

       

      I'd like to hear of you and thank you!

        • 1. Re: Random Writes low Throughput
          srodenburg Hot Shot
          vExpert

          Hi Sander,

           

          You are saying "Random Read of 4 times as high as the Random Write speed" but also asking "are there some standard steps to increase the Random Write Throughput?"

          That does not add up. With your first question, did you mean to ask why writes are 4 times as high as reads?

           

          Are you running a normal cluster or a stretched cluster?

          Is read-locality active or not?

           

          • 2. Re: Random Writes low Throughput
            SanderHogewerf Novice

            Hi,

             

            The reads are 4 times higher than the Writes, not the other way around.

             

            It is a stretched cluster.

             

             

            For example the above storage policy is used.

            • 3. Re: Random Writes low Throughput
              TheBobkin Virtuoso
              VMware EmployeesvExpert

              Hello SanderHogewerf,

               

              Are you perchance looking at VMs registered and running on Non-Preferred/Secondary site where their data resides ONLY on Preferred site?

               

              Bob

              • 4. Re: Random Writes low Throughput
                SanderHogewerf Novice

                Hi Bob,

                 

                No the VM is also on the preffered site, just as the data is.

                 

                Sander

                • 5. Re: Random Writes low Throughput
                  srodenburg Hot Shot
                  vExpert

                  Then it's simple. It's a stretched cluster and assuming read-locality is active (it should be), write-IO's have to go to both sites while read-IO's only come from the site the VM is running in.

                  If all is setup correctly. I seriously wonder if your setup is because why would you use a stretched cluster, all the while data is stored only in 1 site??

                   

                  So assuming i'm right, your Storage policy is all wrong. You are not mirroring over both sites now (Site disaster tolerance = none). You are only mirroring locally per site now.  It should be at least the other way around:

                  Site disaster tolerance = "Dual Site Mirroring" meaning Geo protection.

                  Failures to tolerate = None (for "Geo Only" aka Stretched cluster mirroring only) or 1 Failure (for Geo **AND** local mirroring).

                   

                  In the case of Geo mirroring **AND** local mirroring, 100GB worth of data will be stored 4 times. 2x in each site so be aware of the storage costs. Such policies should be driven by application workload availability requirements.

                   

                  Fix your policy first. Make a new one and apply them to 1 VM at the time (or at least not all of them at the same time as that is IO suicide). After all VM's are correctly protected, re-evaluate. But what you will always see is that writes are much slower than reads because writes have to be written in both sites and reads are local, thus much much faster.

                  After all is done, make the new policy the default policy for the datastore so that mistakes cannot happen again.

                  • 6. Re: Random Writes low Throughput
                    depping Champion
                    VMware EmployeesUser Moderators

                    Also, keep in mind that vSAN has a per host local memory cache of 1GB, so depending on the workload, it could be you are hitting the cache a lot.

                    • 7. Re: Random Writes low Throughput
                      SanderHogewerf Novice

                      Hi srodenburg

                       

                      Thanks for your answer! I understand it.

                       

                      To give some more information and why the storage policy is chosen this way.

                      The server is part of a cluster which failsover in the application. So the stretched component is not needed with this server, this is why i keep the data in one site.

                       

                      I increased the striping in of the policy and than i saw the writes kick up a bit. Does this say that i'm on the limit of the diskgroup?

                      • 8. Re: Random Writes low Throughput
                        srodenburg Hot Shot
                        vExpert

                        Not persé but adding stripes is counter productive, especially on all-flash. Normally, in a Mirror, data needs to be fetched from only two capacity devices. One on say, node 3 disk 2 and the other mirror copy lies on node 6 disk 4 (just an example). So the data needs only to be written and read to and from 2 devices.

                        If you start striping, say a stripe-width of 2, that means that each mirror copy is now actually made up of 2 parts so 4 parts in total. Now, reads and writes are done from 4 devices. This adds latency. The higher the stripe-width, the higher the latency becomes.

                         

                        Striping has only 1 use-case and that is on slow rotational drives where sequential performance (streaming etc.) is important and latency is not.

                        In other words, you just made the latency worse for yourself ;-)

                         

                        Also, from a pure data point of view, it sounds like you have absolutely no use case for a stretched cluster if both sites keep their data to themselves anyway. Why not have 2 regular clusters. Much simpler, no messing about with a Witness Appliance etc.

                        In case of an emergency, you don't have the time and means of moving all the data to the other side so quickly anyways because you never replicated a thing to the other site.

                         

                        Or is it just a small number of VM's which are, policy-wise, not stretched but all other VM's are mirrored in both sites?

                         

                        My advice:  stripe-width of 1 as Flash is so fast, you gain nothing with striping. It just slows everything down as data needs to be written to and fetched from more devices than it needs to.

                        • 9. Re: Random Writes low Throughput
                          SanderHogewerf Novice

                          Thank you very much for your answers! I now know why the writes are that much slower than the reads.

                           

                          The amount of machines which do have a failover at application level is not a great number.

                           

                          For now thank you!