1 2 3 Previous Next 32 Replies Latest reply on Sep 2, 2009 10:25 PM by grantdavies Go to original post
      • 15. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
        Keith001 Novice

         

        No further changes were needed to anything.  I could not find any way to enable write caching on the RAID controller.  I assume it's an automatic setting which is enabled when the RAID controller detects the battery pack attached to the RAID cache DIMM.

         

         

        The only indication that the 1GB RAID cache with the battery pack was installed (other than a speed increase in writes) was the RAID controller message displayed on the server's console during power on.  I seem to recall that a message was initially displayed after installing the 1GB RAID cache with the battery pack indicated that the battery pack was not fully charged and RAID performance would not optimal at that time.

         

         

        • 16. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
          PeteLong Novice

          Downgrading the firmware on the 410i card from 1.62 to 1.58C cured the problem for me.

          • 17. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
            Keith001 Novice

             

            It should have been a sequential write test.  I tested write performance by observing the VI Client performance chart for disk I/O while creating a 1 GB file on a RHEL4 VM.  I believe the command was "dd if=/dev/zero of=/tmp/test.txt bs=1k count=1000000".

             

             

             

             

             

            As far as partition alignment issues are concerned, I have no idea if that's a possibility.  However, the datastore creation and VM creation/installation was performed via the VI Client.

             

             

            • 18. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
              seocsitvmw Lurker

               

              Hi, same slow disk performance problem. I've downgraded the firmware version to 1.58C but this is a rate...

               

               

              time dd if=/dev/zero of=/tmp/test.txt bs=1k count=1000000

              1000000+0 records in

              1000000+0 records out

               

               

              real    1m1.819s

              user    0m0.300s

              sys     0m2.380s

               

               

              the I/O is less to 20MB/s... You can try the same test and past here the output?

               

              Thanks in advance!

               

               

               

               

               

              • 19. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                J1mbo Virtuoso

                 

                Might be worth retesting with a bigger block and file size.  On my test environment (Perc 5i raid-5, 3x sata 500GB), 1K block size 1GB file gives 15MB/s. 64K block 1GB file gives >250MB/s (effect of 512MB write cache), 64K block 2GB file gives ~60MB/s which is more realistic.  Note this is dd for Windows though.

                 

                 

                • 20. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                  patparks1 Enthusiast

                   

                  Thanks for posting that adding the BBWC helped improve your performance.  I have 3 ESXi 3.5 boxes which are HP DL360Gen5's, all with similar slowness.   I've requested to my manager that we need to purchase these BBWC to improve the situation.  It's just good seeing another confirmed instance that this helped resolve the issue.

                   

                   

                   

                   

                   

                   

                   

                   

                  • 21. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                    patparks1 Enthusiast

                     

                    We got our BBWC in this week and got it installed.   After about 24 hours, we tested write speeds and are now seeing write speeds of approx 60MB/s on our RAID5 array...which we can live with.  Sure beats the pants off 4MB/s

                     

                     

                     

                     

                     

                     

                     

                     

                    • 22. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                      Rabie Enthusiast

                      Just a note, we have an open call with HP on the P410 SmartArray. Under certain specific conditions on a raid 5 array with 512mb cache we have managed to reliably hang up the raid controller in both Windows 2003 and RHEL 5.

                       

                      On Windows using IO meter we have seen brilliant performance when just read or just write IO's but when using small IO with combinded read/write the performance VERY quickly degrades where after about 30minutes the disks stop responding.

                       

                      I would suggest you log a call with HP.

                      • 23. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                        Paul Mead Lurker

                        I have recently rolled out an HP ML330 G6 server with P410 zero cache controller and 3x250Gb SATA near-line disks; am running a virtualised version of the old client server in the main VM (only VM at present) and this was achieved by using the VMware converter - all went reasonably well. The disk arrangement is two of the disks are mirrored with the 3rd as a hot spare.

                         

                        First half morning of semi-live tests indicate that there is a performance issue - which I relatively quickly had down as a write cache issue. Used Windows Performance monitor (disk queue length>50 for much of the time) and IOMeter showing disk throughput lower than I would expect.

                         

                         

                         

                        I have just discovered that HP ship all their disks with the onboard DISK write cache switched off (I am guessing that this is just in case you are using zero Mb controller with no battery backed cache and maybe you do not run a server with a UPS either); I guess I must have missed the great big NEON sign that said - "for customer safety we have ensured that your write performance will be abysmal - just in case you are silly enough to run without safety nets").

                         

                         

                         

                        I have not been able to go back to check that switching the ondisk caching back on helps (4am site visit tomorrow am will give us the results - argh). I hope the above diatribe and debrief helps anyone else.

                         

                         

                         

                        Of course - I am wondering if people are running their server with decent levels of battery backed cache, but that may still have the default ondisk cache off - I guess they might never realise that the ondisk cache is OFF and could be enjoying even better performance - what a waster that would be.

                         

                         

                        Makes you wonder why anyone works in IT - when the default settings make an expensive SAS raid controller running mirrored disks slower than a very slow thing indeed.

                         

                         

                        I will report back late late tomorrow or perhaps even Tuesday when my brain returns from its hiding place. Any feedback welcomed.

                        • 24. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                          J1mbo Virtuoso

                           

                          Unfortunately that servers disk spec is very poor for ESXi.  I read recently that 'near-line' is derived from the drives being somewhere between on-line and off-line (tape) storage - generally little more than desktop SATA disks.  In any case the controller based BBWC is critical.  The underlying disk write cache I'm not too sure will make any measurable difference when a controller BBWC is installed (and particularly with SAS disks).

                           

                           

                           

                           

                           

                          As a point of interested, I performed a clean shut-down on my test ESXi box (which has 512MB BBWC on a Dell Perc 5i), turned it off and disconnected the battery for a few moments.  The result was that the volume was corrupted and I had to restore all my VMs from a backup - I would conclude that enabling write-caching anywhere without battery backup is pretty brave.

                           

                           

                          • 25. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                            Jackobli Master
                            Paul Mead wrote:

                            Of course - I am wondering if people are running their server with decent levels of battery backed cache, but that may still have the default ondisk cache off - I guess they might never realise that the ondisk cache is OFF and could be enjoying even better performance - what a waster that would be.

                             

                            Oh boy, this has been discussed and disputed for eons.

                            THOU SHALL NOT USE WRITE CACHE ON THE DISK ITSELF.

                            Anything could happen, if your disk controller and your OS are thinking, they wrote the data and it hasn't happen. There is more than the main power that could fail.

                            Go buy a BBWC for your controller, if you are depending on your VMs and your data.

                            • 26. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                              Paul Mead Lurker

                              Well - I am back; mixed feelings - especially with some of the feedback - thanks to all for your input so far.

                               

                              Additional info: The server was first prepped at our site and then taken to client site for final config and physical to virtual conversion. When the server first came online after the move, but before conversion, one of the 2 mirrored disks had complained and appeared to be performing a rebuild. I let this run overnight before the 1/2 day semi live run.

                               

                               

                               

                              Tonight I have been over to the client out of hours and run a  few tests. Running IOMeter  with 2 workers with 2 access specifications 32kblocks with 100%read and 0%random and the 2nd with 32k blocks with 100%write and 0%random for 1 minute for each of the access specifications (2 minute total run); (In simple terms a seq read test and then a seq write test); I then proceeded to delberately switch OFF the ondisk caching (which I had in fact switched on during initial build - just thought I had not) ran the tests and then reran the tests with ondisk caching switched back on. FULL physical server and virtual server power down and restart for completeness before each test.

                               

                               

                               

                              Summary Results:

                               

                               

                               

                              OnDisk caching OFF: avg 47MBps read and avg 5.6MBps writes

                               

                               

                               

                              OnDisk caching ON: avg 66MBps read and avg 52MBps writes

                               

                               

                               

                              So that is looking quite good; unfortunately I do not have exactly the same results for comparison from the previous 24 hours of poor performance; although the closest comparison would seem to be:

                               

                               

                               

                              Bad/Weird State with partial caching?:  avg 37MBps read and avg 29MBps writes

                               

                               

                               

                              Interim conclusion: I suspect that perhaps the disk/array problem after the server physical move may have left the ondisk caching in an odd state?

                               

                               

                               

                              What I would say is that although things seem to be better - at times the server avg disk queue length is still higher than I would normally be happy with - I suspect that we may need to invest in some BBWC kit.

                               

                               

                               

                              Just for completeness: the disks used in this server are 3x 250Gb "Midline" disks pn: 458926-B21

                               

                               

                               

                              If anyone has the same setup but with 256 or 512Mb cache with or without battery backup - I would be very interested to see your results; I can send the iometer settings file across if desired.

                               

                               

                               

                              My own virtualised server running running in a similar fashion using an LSI Megaraid SATA 300-8XLP SATA II raid adapter with 128Mb embedded cache and 2x1.5TB pn: WD15EADS mirrored disks gets figures of  92Mbps read and 110Mbps write! seems odd, but overall appears nippy enough in practice and no long disk queue during any of the normal usage or iometer workloads tested so far (obviously not trying hard enough).

                              • 27. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                                J1mbo Virtuoso

                                 

                                The performance stats gathered confirm that write caching (or lack thereof) is the problem.  However the current configuration is dangerous.

                                 

                                 

                                • 28. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                                  patparks1 Enthusiast

                                   

                                  grantdavies:

                                   

                                   

                                  We recently added the BBWC to a DL360 Gen 5 that was performing slowly in our data centre.   I took the server down to physically install the battery.  Then booted the server with Smart Start to ensure that the cache ratios were setup to the defaults (75% write and 25% read).  They were there automatically.  I then booted the server back up into VMWare ESXi 3.5.

                                   

                                   

                                  Approx 24 hours later, i confirmed that speeds were approx 60-70MB/s...which was a huge improvement over previous speeds of 2-4MB/s.  

                                   

                                   

                                  Thus, I did nothing else other than shut down, plug in battery and turn back on.   It's worth noting too that I had a P400i with 256MB of cache and I left the 256MB of cache, just added the BBWC and our problems are resolved.

                                   

                                   

                                  • 29. Re: ESXi 3.5 U4 on HP DL380 G6 has slow disk performance
                                    Paul Mead Lurker

                                     

                                    Thanks to all, but this was the issue; due to there being no raid

                                    cache on the ML330 G6 P410 raid controller server as standard there

                                    was a significant write performance hit (Performance utility and disk

                                    queue. It was eventually found that the NOD32 ESET antivirus product

                                    which was using XMON to check the exchange datastore; everytime the av

                                    sig was updated, the entire exchange datastore would be rechecked!

                                    Normally this would not matter as most of the time our servers and our

                                    clients servers are performing optimally and the effect of the scan is

                                    masked.

                                     

                                     

                                    Due to the cache issue, the scan effect dragged down the entire

                                    server performance. Disk queues could be as high as 40-50 for tens of

                                    minutes at a time.

                                     

                                     

                                    The XMON background scanning can be turned off (in fact some

                                    recommend that it is turned off, despite it being on by default); We

                                    have resolved the overall issue by adding 512Mb battery backed cache to

                                    the controller and performance is much much much better and I believe

                                    that if we had started with the cache, we would never have noticed any

                                    issues at all.

                                     

                                     

                                    Please note that the ESET NOD32 XMON issue is not at fault - it simply highlighted the disk performance issue.

                                     

                                     

                                    My main gripe is that for some reason HP sell the current range of ML330 G6 generation of servers with NO=Zero cache memory as standard on the P410 controller! Silly me thought that they would not possibly sell something so handicapped.Not only that, when you decide to upgrade you can either purchase a 256Mb module or 512Mb with battery. Why not 256Mb with battery?

                                     

                                     

                                    FYI we now get disk queues of <2 most of the time and occasional blips of higher queue levels - but it is like comparing the norfolk broads with Wales! IOMeter tests show throughputs of 160MBs reads and 230MBs writes (not bad for 7200rpm mirrored pair) - which compares to early non-bbwc results of approx  66MBps read and 52MBps writes. These non-bbwc results are worse than the headline values indicate due to the type of iometer test and also the fact that the disk queue length was so high too during the test.

                                     

                                     

                                    Still a learning exercise for me  - (1) ESXi NEEDS a properly cached controller for decent write performance - more so that running a physcial box it would seem and (2) not to take HP specs at face value.

                                     

                                     

                                    I would love to get some feedback from HP on this matter as I was intending to standardise on their kit as we seek to use ESXi for all our clients i.e. why sell a raid controller with zero cache memory and not insist on cache upgrade or at least make it very clear tha you are buying something which for many purposes will be too slow? I sign off somewhat bemused.