11 Replies Latest reply on May 27, 2009 2:37 PM by prisoner881

    Max number of VM's before Server 2.0 gets sluggish

    prisoner881 Enthusiast

       

      I've got the following hardware:

       

      • 2x Quad core Opteron 2.6GHz

      • 32GB RAM

      • 8x 150GB Western Digital Raptor 10K drives in RAID10 on Areca SATA RAID controller

       

       

       

      All in all, a stout box.  I've got 11 VM's running on it.  Seven are Windows 2003 R2 (mix of 32-bit and 64-bit).  Four are Linux (CentOS and Ubuntu, all 32-bit).  Of the Windows VM's, one is an Exchange 2007 server with 6GB RAM serving 40 users.  The host runs with about 40% CPU utilization during work hours (about 10% off hours), and memory load is about 24GB out of the 32GB total.  You'd think there was plenty of room for expansion.

       

       

      However, adding a 12th VM does something ugly to performance.  CPU doesn't spike, and memory is well within limits.  Disk latency seems to go through the roof, though.  Doing even basic stuff like booting a Linux VM takes several minutes instead of less than a minute.  What's odd is that the disks aren't thrashing.  Disk activity looks pretty normal, not maxed out.  They're just acting really, really slow, which slows down every VM on the box.

       

       

      I decided to test whether it was CPU, RAM, or disk causing this.  I set up an Openfiler SAN on a separate box, mapped a drive to it using the built-in Windows iSCSI initiator, and put the 12th VM on it.  This time, it booted fine and the other VM's were unaffected.  I loaded two more VM's on the iSCSI and still didn't affect performance.  So the problem seems rooted in maxing out the locally-attached storage in some way.  But this doesn't make sense because the Windows performance counters don't show high numbers for outstanding disk I/O's.  I'm stumped.

       

       

      Has anyone else run into some kind of limit on how many VM's you can run on VMWare server 2.0? Is there some kind of tuning I could do that might help this situation?

       

       

        • 1. Re: Max number of VM's before Server 2.0 gets sluggish
          wila Guru
          vExpertCommunity WarriorsUser Moderators

          Hi,

           

          Does your Areca controller have a BBU?

          Without a BBU it does make sense that you hit the limit on your storage first, as the controller will spent a lot of time writing to disk and SATA isn't very performant.

          There is a reason that people run SCSI/SAS instead of SATA even with the big differences in price.

          The BBU might mitigate that for a bit.

           

          Also what kind of windows flavor is the host OS that you are running?



          --

          Wil

          _____________________________________________________

          Visit the VMware developers wiki at http://www.vi-toolkit.com

          | Author of Vimalin. The virtual machine Backup app for VMware Desktop Products
          | Vimalin : Automated backups for VMware Fusion and VMware Workstation Professional
          | More info at https://www.vimalin.com
          | Twitter @wilva
          | VMware Wiki at http://www.vi-toolkit.com
          • 2. Re: Max number of VM's before Server 2.0 gets sluggish
            prisoner881 Enthusiast

            Yes, it does.  However, the cache is pretty small and shouldn't take long to dump to disk.  Even if it didn't have a BBU, the entire server is on a big UPS that gracefully shuts down the host in the event of an extended outage.  Host OS is Win2k3 R2 64-bit.

             

             

             

             

             

             

             

            None of this affects performance, though.  BBU doesn't do squat when it comes to performance.  It only comes into play if the power is lost.

             

             

             

            And as for SCSI/SAS vs. SATA, I'd agree with you if the performance counters showed the disks being bottlenecked by I/O requests.  They don't.  The performance counters show moderate but not excessive traffic.  I'm still waiting for a good explanation.

            • 3. Re: Max number of VM's before Server 2.0 gets sluggish
              asatoran Champion

              I decided to test whether it was CPU, RAM, or disk causing this. I set up an Openfiler SAN on a separate box, mapped a drive to it using the built-in Windows iSCSI initiator, and put the 12th VM on it. This time, it booted fine and the other VM's were unaffected. I loaded two more VM's on the iSCSI and still didn't affect performance. So the problem seems rooted in maxing out the locally-attached storage in some way. But this doesn't make sense because the Windows performance counters don't show high numbers for outstanding disk I/O's. I'm stumped.

               

              And as for SCSI/SAS vs. SATA, I'd agree with you if the performance counters showed the disks being bottlenecked by I/O requests.  They don't.  The performance counters show moderate but not excessive traffic.  I'm still waiting for a good explanation.

               

              I'll play Devil's Advocate.  Doesn't your evidence indicate that it's not the number of VMs, but the number of VMs accessing the SATA storage?  You ran the 12th VM on the same datastore and performace was poor.   But you ran the 12th & 13th VM off of different storage without problem.  So doesn't that indicate that it's a limit on SATA or your controller?  Not knowing why, doesn't mean it's not true.    Maybe Windows' performace counters are wrong.  it's probably only displaying what's going to the cache, not what the cache is reading/writing to the actual disk.  (Just a guess.  Correct me if I'm wrong.)

               

              As stated, there's a reason why servers get SCSI/SAS.  I don't fully understand the technicals, (perhaps some other member can) but from what I'm told, SCSI/SAS does random access and multiple simultaneous acces better while SATA does sequential access better.  (Again, correct me if I'm wrong.)  So having multiple datastores would be the workaround for using SATA drives with that many VMs.  (i.e.: multiple RAID1 instead of one RAID10.  But you may have to have separate controllers.)  This is what you did when you used the NAS to store the 12th & 13th VMs.  Multiple datastores using multiple controllers. (iSCSI & SATA.)

              • 4. Re: Max number of VM's before Server 2.0 gets sluggish
                wila Guru
                Community WarriorsvExpertUser Moderators

                None of this affects performance, though.  BBU doesn't do squat when it comes to performance.  It only comes into play if the power is lost.

                Well most controllers will not allow you to enable write back cache UNLESS there's a BBU on the controller. So check your controller settings and make sure that the write back cache is enabled.



                --

                Wil

                _____________________________________________________

                Visit the VMware developers wiki at http://www.vi-toolkit.com

                | Author of Vimalin. The virtual machine Backup app for VMware Desktop Products
                | Vimalin : Automated backups for VMware Fusion and VMware Workstation Professional
                | More info at https://www.vimalin.com
                | Twitter @wilva
                | VMware Wiki at http://www.vi-toolkit.com
                • 5. Re: Max number of VM's before Server 2.0 gets sluggish
                  prisoner881 Enthusiast

                  The Areca controllers will allow write-back caching with or without a BBU, so that isn't it.  And write-back caching is enabled for the array.

                  • 6. Re: Max number of VM's before Server 2.0 gets sluggish
                    prisoner881 Enthusiast

                     

                    asatoran wrote:

                    I decided to test whether it was CPU, RAM, or disk causing this. I set up an Openfiler SAN on a separate box, mapped a drive to it using the built-in Windows iSCSI initiator, and put the 12th VM on it. This time, it booted fine and the other VM's were unaffected. I loaded two more VM's on the iSCSI and still didn't affect performance. So the problem seems rooted in maxing out the locally-attached storage in some way. But this doesn't make sense because the Windows performance counters don't show high numbers for outstanding disk I/O's. I'm stumped.> And as for SCSI/SAS vs. SATA, I'd agree with you if the performance counters showed the disks being bottlenecked by I/O requests.  They don't.  The performance counters show moderate but not excessive traffic.  I'm still waiting for a good explanation.

                    I'll play Devil's Advocate.  Doesn't your evidence indicate that it's not the number of VMs, but the number of VMs accessing the SATA storage?  You ran the 12th VM on the same datastore and performace was poor.   But you ran the 12th & 13th VM off of different storage without problem.  So doesn't that indicate that it's a limit on SATA or your controller?  Not knowing why, doesn't mean it's not true.    Maybe Windows' performace counters are wrong.  it's probably only displaying what's going to the cache, not what the cache is reading/writing to the actual disk.  (Just a guess.  Correct me if I'm wrong.)

                     

                    As stated, there's a reason why servers get SCSI/SAS.  I don't fully understand the technicals, (perhaps some other member can) but from what I'm told, SCSI/SAS does random access and multiple simultaneous acces better while SATA does sequential access better.  (Again, correct me if I'm wrong.)  So having multiple datastores would be the workaround for using SATA drives with that many VMs.  (i.e.: multiple RAID1 instead of one RAID10.  But you may have to have separate controllers.)  This is what you did when you used the NAS to store the 12th & 13th VMs.  Multiple datastores using multiple controllers. (iSCSI & SATA.)

                    The evidence seems to convict the controller, I agree, but it's only circumstantial evidence.  I've never yet encountered an issue with perfmon showing incorrect info on physical disk drives.  I have had them show nothing at all with some RAID controllers, but that's an obvious anomaly.  As a rule, if the counters work at all, they show accurate information.

                     

                     

                    Regarding SCSI/SAS vs. SATA, I am familiar with the technicals.  SCSI/SAS/FC typically offers better performance in high, multi-user workloads due to how the protocol and bus is implemented.  In some cases the physical drives have significant tuning done to optimize for high multi-user loads, whereas SATA is generally optimized for single-user or low multi-user loads. I would expect a SATA RAID array to fall down before a SCSI/SAS/FC array.  What I wouldn't expect is for a high-performance SATA RAID array to go from excellent performance to falling flat on its face simply because I added one more VM -- and a lightweight, low impact VM at that.  I would expect performance to tail off somewhat gradually as load was increased instead of falling off a cliff.

                     

                     

                    There's one more item I failed to mention, and I think it's crucial: while the VM's are acting all sluggish and unhappy, host disk access remains snappy.  I can happily copy a 9GB file from the server to my workstation at about 70MB/sec over Gigabit while the VM's are acting like they're stuck in molasses.  This is the strange part that has me baffled. If the array truly were saturated, I'd expect host disk access to be just as ugly as the VM's.  Only it's not.

                     

                     

                    I'll again pose the question I started this thread with:  is anyone running more than 12 VM's on their VMWare Server 2.0 install?  If so, what's your performance like?

                     

                     

                    • 7. Re: Max number of VM's before Server 2.0 gets sluggish
                      wila Guru
                      User ModeratorsvExpertCommunity Warriors

                      What is your host OS?

                      --

                      Wil

                      _____________________________________________________

                      Visit the VMware developers wiki at http://www.vi-toolkit.com

                      | Author of Vimalin. The virtual machine Backup app for VMware Desktop Products
                      | Vimalin : Automated backups for VMware Fusion and VMware Workstation Professional
                      | More info at https://www.vimalin.com
                      | Twitter @wilva
                      | VMware Wiki at http://www.vi-toolkit.com
                      • 8. Re: Max number of VM's before Server 2.0 gets sluggish
                        AGTCooke Novice

                        However, adding a 12th VM does something ugly to performance.

                         

                        RACDBA and I had a similar thread going here, but it died out.

                         

                        *Bump*, and here's hoping you get somewhere with this. I'd like some answers too.

                        • 9. Re: Max number of VM's before Server 2.0 gets sluggish
                          prisoner881 Enthusiast

                          Host OS is Win2k3 R2 x64.

                          • 10. Re: Max number of VM's before Server 2.0 gets sluggish
                            wila Guru
                            User ModeratorsCommunity WarriorsvExpert

                            OK, so a supported server based OS. In that case i do not think that there is a fixed maximum number of Virtual machines before the host gets sluggish.

                            It depends on a number of factors where hardware limits are setting the amount of VMs that can be run.

                             

                            To put it differently - and as your own research has already hinted at - you are having a bottleneck causing slow down of the host.

                            Assuming that you didnt change critical defaults like changing priority of background services to be less important as foreground applications, my hunch is still that your disks are saturated.

                             

                            Up until a certain degree it actually makes sense that the host is still performing better as your guests as when the guests on this host stops responding, automatically all of the host management becomes slow as well and as a result all your VMs would be slow. So that bit can be out of balance eg. giving more priority to the host by default.

                            You don't ever want one overloaded guest to take the host down do you? If that would be possible that one guest could slow down ALL guests which isn't a desirable situation.

                             

                            It seems to me that the only solution you have is to raise the limit on your current bottleneck and either add more disks (and/or controller) or external network storage.

                             

                            Another alternative is to use ESX/ESXi instead which just scales better as it has the hypervisor at the bottom and not a server OS focused/tuned on other things as virtualisation.



                            --

                            Wil

                            _____________________________________________________

                            Visit the VMware developers wiki at http://www.vi-toolkit.com

                            | Author of Vimalin. The virtual machine Backup app for VMware Desktop Products
                            | Vimalin : Automated backups for VMware Fusion and VMware Workstation Professional
                            | More info at https://www.vimalin.com
                            | Twitter @wilva
                            | VMware Wiki at http://www.vi-toolkit.com
                            • 11. Re: Max number of VM's before Server 2.0 gets sluggish
                              prisoner881 Enthusiast
                              wila wrote:

                              OK, so a supported server based OS. In that case i do not think that there is a fixed maximum number of Virtual machines before the host gets sluggish.

                              It depends on a number of factors where hardware limits are setting the amount of VMs that can be run.

                               

                              To put it differently - and as your own research has already hinted at - you are having a bottleneck causing slow down of the host.

                              Assuming that you didnt change critical defaults like changing priority of background services to be less important as foreground applications, my hunch is still that your disks are saturated.

                               

                               

                               

                              Up until a certain degree it actually makes sense that the host is still performing better as your guests as when the guests on this host stops responding, automatically all of the host management becomes slow as well and as a result all your VMs would be slow. So that bit can be out of balance eg. giving more priority to the host by default.

                              You don't ever want one overloaded guest to take the host down do you? If that would be possible that one guest could slow down ALL guests which isn't a desirable situation.

                               

                               

                               

                              It seems to me that the only solution you have is to raise the limit on your current bottleneck and either add more disks (and/or controller) or external network storage.

                               

                               

                               

                              Another alternative is to use ESX/ESXi instead which just scales better as it has the hypervisor at the bottom and not a server OS focused/tuned on other things as virtualisation.

                               

                               

                               

                               

                               

                               

                              --

                              Wil

                              _____________________________________________________

                              Visit the VMware developers wiki at http://www.vi-toolkit.com

                               

                              I'm rapidly moving in that direction.  I've got ESXi 4 in testing now.  VMware Server 2.0 seems like a good product, but community support seems anemic at best.  The ESX/ESXi/vSphere communities seem much more active and knowledgeable.