1 2 3 Previous Next 36 Replies Latest reply on Jul 2, 2015 12:56 AM by jonretting

    Experiencing random High disk latency with Dell H730P controller and VSAN

    Isalmon Novice

      So we have been experiencing random periods of poor performance on our Dell 730 VSAN cluster. We are using the H730P in HBA mode. We have 2 400GB SSD's in each box and 4 600GB 15K SAS magnetic drives.  What happens is we will notice poor performance on VM's,The vSphere web client (appliance) and when we ssh into each server and run esxtop and check the disk, the DAVG will be all over the place..from 30 to over 1000 ms. Opened a ticked with VMware and they suggested updating the bios and firmware. Dell had a firmware update to deal with high I/O latency. FW update 25.2.2.-0004. We updated the firmware and all seemed ok, then randomly the high disk latency will pop on any one of the servers.

      We are running esxi 5.5 U2 build 2068190.

       

      I know this card was recently certified, is something amiss? build version? crappy Firmware? I am prepared give up on pass-through and redo everything with raid0, we have a similar environment using dell R720 with H710 cards and NO issues whatsoever.

        • 1. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
          jonretting Enthusiast

          What version driver are you using for the 730P? Vmware shows the latest for 5.5 being "megaraid_perc9 version 6.902.73.00-1OEM".

           

          You might want to try and disable ASPM in the bios. What are the server specs? Without this information I can't really recommend any BIOS changes, as they may not be applicable to your servers. As a crazy long shot you could also try flashing your HBA's with LSI IT Firmware that matches your Dell 730P re-brand...

          • 2. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
            Isalmon Novice

            The servers are Dell PowerEdge 730's  with two 400 GB SATA SSD's and 4 600GB SAS each with 256GB of RAM.

            The Bios and firmware were all ugraded to the latest. (at least what dell directed me too). The PERC firmware version is 25.2.2.0004 They had us upgrade from 25.2.1.0037.

             

            Now checking today I saw this..

             

            ESXi 5.5 U2  megaraid_perc9 version 6.902.73.00-1OEM 25.3.0.0015              Partner Async

            ESXi 5.5 U2 megaraid-perc9 version 6.901.57.00-1OEM 25.2.2.0004              Partner Async

            ESXi 5.5 U2 megaraid_perc9 version 6.901.55.00.1vmw  25.2.1.0037                Partner Async

             

             

            Our boxes still have megaraid_perc9 version 6.901.55.00.1vmw.  Dell made no mention of this. So I believe this could be the issue. The mismatch of driver and firmware.

            • 3. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
              jonretting Enthusiast

              After updating the drivers, I would strongly suggest rebuilding the entire VSAN array. Switching off VSAN on the cluster, and deleting all partition information on both magnetics and flash; then "mklabel gpt" them. Your current VSAN array was created under non optimal conditions, and even after the possible driver update "fix" it will still experience problems causing you to chase your tail. Good Luck

              • 4. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                Isalmon Novice

                I hope I don't have to do that in a production environment. We have hundreds of VM's and people working.

                • 5. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                  jonretting Enthusiast

                  Well hopefully you don't and your problems go away. But if they don't... Figuring out your next troubleshooting path will be rather difficult without eliminating that possibility. Obviously you could somehow try to gather enough space on an NFS/iSCSI datastore, and just vmotion the vm storage to that. Moreover I didn't realize the cluster in question is a production environment?

                   

                  Forgot to mention that Log Insight would really help you here, as you could identify any problems in your array to a greater extent. I would definitely do this before scratching the VSAN array if its production. Needless to say I would focus on migrating off the problematic storage medium first, better safe then sorry.

                  • 6. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                    Isalmon Novice

                    Yes unfortunately it is a production environment happening in two separate sites.

                    I really regret going with the 730P and using pass-through mode

                    no issues with the 710 and using raid 0

                    • 7. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                      jonretting Enthusiast

                      Hmm... You could try the following, but I have never attempted such a thing, nor would I ever in a production environment. But in theory you might be able to rebuild onto RAID 0.

                      1. Put ESXi (VSAN Contributer) host into maintenance mode ensuring accessibility.
                      2. Shutdown the host and install a fresh magnetic disk. (same size or more than the largest disk in the vSAN)
                      3. Take the system out of Pass-Through, and enable AHCI.
                      4. Mount a nix live iso over IPMI to that machine
                      5. Do a DD copy of one of the vSAN disks to the new disk you put in.
                      6. Reboot and put that disk into RAID 0
                      7. Boot back into nix live and DD from the new disk to the new RAID 0 disk.
                      8. Repeat the process for each magnetic disk (with more spare disks added you could in theory do them all at one time)
                      9. Once all of them have been migrated, remove the extra players (spare disks), and boot into ESXi.

                       

                      Once again I have never done an operation like this on VSAN, so i could be completely wrong. My bullet list probably needs tweaking as well, but i think you get the idea.

                       

                      Just an idea!

                      • 8. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                        Isalmon Novice

                        Thanks for your help.

                         

                        I though about this as well. My thinking was do a full data migration on one host..Then wipe disks and configure as raid 0..then migrate and repeat..but I am not sure if vsan will like a mix of raid 0 and pass-through in a cluster

                        while doing the migration. Lets hope this resolves the issue

                        • 9. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                          jonretting Enthusiast

                          Yes -- Evacuating a host at a time would be best. Granted the performance wouldn't be consistent across hosts, but in theory it should work, since eventually all hosts will be RAID0. I don't think there would be a comparability issue with the mix of IT/RAID between hosts during the transition. Good Luck!

                           

                          Forgot to mention that you should find out what type of hit if any you take on Queue Depths going from IT to RAID0.

                          • 10. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                            zdickinson Expert

                            I don't believe RAID0 vs Pass Through will change the 895 queue depth.

                             

                            I would also do the rolling migration from Pass Through to RAID0.  Especially if you have at least 4 hosts to always have FTT = 1.  Thank you, Zach.

                            • 11. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                              maxduncan Novice

                              Hi there, I think I am running into the -exact- same problem as you, and just came across this post.

                               

                              I'm running 4 R730xd's with the PERC H730 Mini. I was having countless problems with ssd's showing as perm failed, PSOd's, etc when this cluster was built in February of this year. After working exhaustively with vmw, it seemed to have settled down. Until just recently. The past few weeks, I have observed the cluster slow to an absolute unusable crawl. Writes went from 230mb/s to 7mb/s, and putting hosts into maintenance vsan maintenance mode is taking hours. I'm at my wits end, and thankfully found this thread.

                               

                              Server config: ESX 5.5.0 build 2718055 (Dell)

                              - 12 4TB drives and 2 800GB SSD's per host

                              - H730 controllers are in HBA mode

                              - Firmware 25.2.2-0004

                              - Driver 6.901.57.00.1vmw

                               

                              HCL: VMware Compatibility Guide: I/O Device Search

                               

                              I've gone through endless support calls with vmw on the matter, and they had me install the firmware and driver noted above. They also confirmed that the H730P applies to the H730 mini although the HCL does not make note.


                              Any thoughts on the combination above? Greatly appreciate any advice or assistance. This has been a very frustrating journey.  

                              • 12. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                                jonretting Enthusiast

                                Are you using pass-through? Since the op sounds like he will be reconfiguring as RAID0. Have you tried RAID0? If you are crawling along at 7mb, your latency must be out of control as well. I would gather vsan stats for one of the hosts, and evacuate it + rebuild it as RAID0. Then gather stats again, makes sense? Are you getting sense errors? What errors if any are you getting?

                                • 13. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                                  Isalmon Novice

                                  Sorry to hear your woes,

                                   

                                  well if you had firmware version 25.2.1.0037 previously, Then that is exactly the issue that Dell released 25.2.2.0004


                                  PERC H730/H730P/H830 Mini/Adapter /FD33xS/FD33xD RAID Controllers firmware version 25.2.2.-0004 

                                  Fixes & Enhancements

                                    Fixes:

                                  - Corrects issue with excessive PERC I/O timeouts and SATA SSDs falling offline under heavy I/O.

                                   

                                  Your exact issue.

                                   

                                  Things have settled down for me after so far in both sites with these two combos

                                  - Firmware 25.2.2-0004

                                  - Driver 6.901.57.00.1vmw

                                   

                                  Dell had me update to 25.2.2-0004 and I still had the driver 6.901.55.00.1 driver. I just updated the vib and so far so good

                                  • 14. Re: Experiencing random High disk latency with Dell H730P controller and VSAN
                                    Isalmon Novice

                                    Things have calmed down for now. I will be watching for the next few weeks, but if the latency returns I will switch from pass-through.

                                    I am monitoring the DAVG daily on all boxes and watching with vsan observer..

                                    1 2 3 Previous Next