1 2 Previous Next 15 Replies Latest reply on May 17, 2016 10:52 PM by bilbobagginz

    Homelab vSAN woes

    VirtualizingStuff Novice

      Hello Everyone,


      I have been having some vSAN issues where drives will show degraded when in actuality they are fine confirmed by in LSI controller and via hardware status. The drives (SSD) seem to become degraded once I/O hits the disks like (i.e. creating a VM). All hosts have 2 x SSD with one of those SSD tagged as a HDD. I wrote a post about it back in March here which has screenshots and more detail. Life has been very busy (chasing my 17 month around ) finally have time to continue troubleshooting. Everything is on the HCL for vSAN expect the actual SSD drives.

      Below is the hardware specs:


      Supermicro Servers (X9SCM-F)  x 2:

      • CPU: 2 x Intel Xeon E3-1230v2 “Ivy Bridge”
      • Motherboard:2 x Supermicro X9SCM-F
      • Raid Controller:2 x LSI Internal SATA/SAS 9211-8i
      • Memory:2 x Kingston 32GB Kit DDR3 1600MHz PC3
      • Disks:2 x Lexar Echo ZX 16GB
        • SSD: 4 x Sandisk Ultra II 240GB
      • Network Cards:
        • 2 x HP Infiniband DDR Dual Port HCA Adapter 20Gbps
        • 2 x HP NC360T Dual Port PCI-e Gigabit Card
      • Power Supply: 2 x Seasonic 400W 80 Plus Platinum Fanless ATX12V/EPS12V

      Supermicro X10SAE-O:

      • CPU: Intel Xeon E3-1231 “Haswell”
      • Motherboard: Supermicro X10SAE-O
      • Raid Controller: LSI Internal SATA/SAS 9211-8i
      • Memory: Kingston 32GB Kit DDR3 1600MHz PC3
      • Disks:
        • Lexar Echo ZX 16GB
        • SSD: 2 x Sandisk Ultra II 240GB
      • Network Cards:
        • HP Infiniband DDR Dual Port HCA Adapter 20Gbps
        • HP NC360T Dual Port PCI-e Gigabit Card
      • Power Supply:
        • SS-520FL2 520W ATX12V / EPS12V 80 PLUS PLATINUM


      Any assistance would be greatly appreciated.




        • 1. Re: Homelab vSAN woes
          VirtualizingStuff Novice

          I am going to see if I can get 6 different SSDs that are on the HCL and test.

          • 2. Re: Homelab vSAN woes
            zdickinson Expert

            I would be curious about two things.


            1.)What is the behavior if the capacity tier is an actual HDD instead of an SSD marked as such?
            2.)What is the behavior if this is setup as an all SSD vSAN in v 6?


            Thank you, Zach.

            • 3. Re: Homelab vSAN woes
              VirtualizingStuff Novice

              Hello Zach,

              1. I will put in 3 hdd tomorrow evening and try it again.


              2. I can try that but my HBA controller (9211-8i) has been removed from the HCL last I checked.


              I will report back. Thank you for the reply.

              • 4. Re: Homelab vSAN woes
                beeguar Enthusiast

                Mine isn't on the HCL either for vSAN 6, but I'll probably be trying out the upgrade soon.


                Considering vSAN is a software abstraction layer for the storage, as long as it's functional on 5, I'm not sure of any version 6 features which would make it less so in 6.

                • 5. Re: Homelab vSAN woes
                  jonretting Enthusiast

                  Your systems are probably choking on the queue depths needed at the Flash disk level. Also what magnetic disks are your using? If they aren't SAS then those will choke on the queues as well. Also I have always had problems with VSAN 5.5 and marking an SSD as Magnetic and placing it in a VSAN. Usually tons of sense errors, and frequent drops/disconnects from the VSAN. I would recommend switching to Intel Enterprise PCI-E NVME SSD, and some SAS magnetics. If you upgrade to VSAN 6 you can use all flash arrays, but trying this in 5.5 will end in tears. Cheers

                  • 6. Re: Homelab vSAN woes
                    ChrisKuhns Enthusiast

                    I too have the issues with marking the SSDs, but I have never once had them drop or disconnect from the VSAN. Also, I haven't had any issues with he magnetic disks. The SATA 7.2K have performed perfectly fine, but that may be because the SSDs I have are top tier.

                    • 7. Re: Homelab vSAN woes
                      jonretting Enthusiast

                      The dropping i have had with SSD marking as magnetic usually surfaces during boot storm or mass storage policy changes. All in all i have never had a problem with SATA, except for crazy latencies. And you are totally right, your top tier SSD makes all the difference in the world. Cheers

                      • 8. Re: Homelab vSAN woes
                        cheesyboofs01 Lurker

                        Not to highjack your thread but I am having a simular issue.


                        I have been reluctant to post anything because people tend to just wave the HCL at you.


                        I have three Hp Generation 8 Micro servers. Each server has a 500Gb Crucial SSD and a 4TB 7.2k WD Red Pro's offered out as RAID0.


                        I have install ESXi v5.5 U3 and set up a cluster and enabled and licenced VSANs.

                        I had to downgrade the hpvsa driver to scsi-hpvsa-5.5.0-88OEM.550.0.0.1331820.x86_64.vib to get the B120i to play properly as I understand HP broke the driver.

                        All the drives are seen fine by the server under iLO and SSA.


                        The problem is now everything is hunky dory until I try to use the new datastore when the SSD's will randomly drop offline and the transfer fails. Sometimes one SSD, sometimes two sometimes all three.

                        I can tear down the datastore and instantly recreate it and the disks are all picked up.


                        Can't help thinking this is related to that B120i driver I had to downgrade.





                        Bit stumped what to do now though! I dont want to throw any more money at it. I could try upgrading to ESXi v6 as I know VSAN's was overhauled but its alot of effort to rule it out and there is still the driver issue.

                        • 9. Re: Homelab vSAN woes
                          jonretting Enthusiast

                          Well firstly I would say your Flash disks are not up to the task. Besides that they shouldn't be dropping. Do you have logs for the event? Any sense error codes, there is probably a steady stream of these?


                          You may want to try placing the Flash disks on a different controller (if possible), and see if the problem persists. Does the drop happen foreach host?


                          The basic first steps would be to check your firmware (all around), and make sure you have the appropriate storage driver. When I say all around, i mean motherboard, HBA, and backplanes. When disks are dropping, its usually fixed via firmware/driver combo.


                          Also if you can try replacing the Crucial SSDs with something else. I am running 4x Intel 750 Series 400GB NVME disks, great price point and outstanding performance. They are not HCL or VSAN:HCL, but so far are an exceptional value. I am assuming your not using the Crucial SAS SSD, but the consumer SATA models. Even if you get "stable", your storage hardware won't deliver anything usable. Expect very high latencies, very low IOPS, and will get crazy worse during VSAN object operations. The VSAN:HCL is there to help you choose the right hardware. All in all consumer flash  doesn't suffice, and without it SATA magnetics perform terribly. Also four hosts in a VSAN cluster really is the minimum.


                          Good Luck!

                          • 10. Re: Homelab vSAN woes
                            jonretting Enthusiast

                            Oops almost forgot... Try re-conditioning your SSD drives for each host. Put VSAN into "Manual mode". Start with the first host, evacuate all the data, and then boot that host into a Linux live distro. Use "sudo fdisk -l" to find the path to your SSD "/dev/sdb". Then us "parted" to re-initialize the drive clean.


                            %  sudo parted /dev/sdb

                            |    mklabel msdos

                            |    mklabel gpt


                            That will clear all partitions.


                            Boot back into ESXi and add the host back into the VSAN using Disk Management. Verify the host is now part of the VSAN, and all the host are communicating with each other (look for the yellow exclamation point on host icon). Take the host out of maintenance mode, and repeat the process for the next host in the cluster.


                            I have fixed various VSAN issue with drives dropping by using this procedure.

                            • 11. Re: Homelab vSAN woes
                              cheesyboofs01 Lurker

                              Thanks for your comments.


                              As the OP title suggests this is a Home lab setup also. I think 4 x PCIe flash cards may be a little over kill.


                              I think the problem (for me) was the almost non-existent queue depth on the B120i RAID controller. I have now installed a 3 x H220 SAS Host Bus Adapters into my 3 x Generation 8 Microservers and now they seem to be playing ball.



                              • 12. Re: Homelab vSAN woes
                                VirtualizingStuff Novice

                                Apologize for the delay. Over the last couple of day I updated my homelab to vSphere 6 (Thanks to EVALExperience) and am happy to report vSAN is running beautifully over the 20GB infiniband network. I was able to create and clone VMs and the SSDs did not go into an unhealthy status. Thanks everyone for their input and suggestions!

                                • 13. Re: Homelab vSAN woes
                                  bilbobagginz Lurker

                                  VirtualizingStuff have you installed ESXI 6.0 on the X9SCM-F?


                                  i am currently setting up a home lab and i am wondering which version i should use

                                  • 14. Re: Homelab vSAN woes
                                    VirtualizingStuff Novice

                                    Yeap currently have the latest version of ESXI 6.x running in the homelab with no issues.

                                    1 2 Previous Next