clustered esxi 4.x with infiniband srp storage target

Version 7
Visibility: Open to anyone

    Hello everyone

     

    This document will show you how to create an infiniband network with ESX - ESXI 4.x servers

     

    Overview:

    • Creating Storage back-end
    • Creating ESX/ESXI configuration using mellanox infiniband drivers.

     

    Requirement:

    • infiniband switch with subnet manager if possible. you can get 1 on ebay for cheap if you decide to go the cheap way.
      • i used a topspin 90.
    • infiniband cards. Mellanox Connect-X cards are working properly for me.
      • Apparently, theses cards will work with ESX-I, Infinihost III HBA from mellanox however, i have not tested. this is from someone which told me it was working.
    • infiniband cables. theses are not cheap.
    • Know how to build a kernel.
    • ESX-I driver for the connect-X card.

     

    • At least, 3 system.
      • 2 front-end that are on the HCL for vmware ESXI installation
      • 1 File server. preferably something that is quality if possible to prevent issues. X86_64 hardware only (64bit).
        • True Hardware raid card recommended. i have not tested with anything else as software raid will FAIL for this setup.
          • It can be single drives that are sata/sas however, there will be no redundancy.
        • ABSOLUTELY NO IDE DRIVES. this will not work as it need a scsi class disk.

     

     

     

    Instructions:

    • Infiniband switch
        • make sure the subnet manager if configured and enabled.
        • If the switch does not have a subnet manager configured, we will have to use a software subnet manager which is not recommended.
          • will show an example later on.

       

      File server configuration.

      • Depending on the size of the disk or volume you wish to create, there are limits.
      • Disk creation steps.
        • Create a 30GB partitions. this one will be used for the linux based operating system.
        • Create multiple 1.99TB arrays or less. do not go over 1.99TB as vmware will issue an error while creating the vmfs volumes.
        • DO NOT USE LVM as this is unsupported.

       

      • Install CentOS 5 on it or a compatible linux operating system. minimal installation. we will install as we go.
        • Install it on the first disk. at this step, it does not matter if you use lvm or not for the operating system disk. However, it is unsupported and will most likely create you trouble later on when we will export the disks via infiniband to vmware.

       

      • Once installed, we need to install a few packages.
        • yum install gcc cpp g++ rpm-build subversion ncurses-devel lsscsi

       

      • Download and install the infiniband driver pack
        • Download the OFED-1.5.3.1 installation package from mellanox.com. you need the linux package.
        • If you are not using CentOS, you will still have to download the installation package however, you will not be able to install it the easy way.
          • extract the OFED package, there will be a folder called: SRPMS. go under it.
            • copy the files listed bellow to /usr/src/infiniband and extract them with the command rpm2tar. you will need to extract them 2 more time from the previous location until you see the folders.
              • the list: ibutils, libibmad, libibumad, libibverbs, libipathverbs, libmthca, opensm, srptools
                • the list might be incomplete as i did not build for this system however, for simplicity, i am doing the instruction for this one. i built using gentoo x86_64 with the science overlay via layman.
              • once extracted, you will have to run into each folders, run ./configure --path=/usr and build.
                • if the configuration process fails, install the dependency. it should be 1 of the files listed above.
        • if you are using CentOS or Rpm based operating system, install the package by using ./install.sh under the OFED tar file.
          • you might want to install the package in advanced mode and skip the 2 kernel drivers as they will not be needed.

       

      • The driver pack is installed? good, time to upgrade the kernel to the latest version, 2.6.38-r2 at the time ofthis writing however, use the latest kernel available from kernel.org website as some bugs will be fixed.
        • extract it to /usr/src/kernels/linux-2.6.38.2
      • Kernel extracted, it's time to get the software we will use for the SRP(infiniband) target
      • go under the kernel tree, cd /usr/src/linux-2.6.38.2
        • type: ls ../scst/scst/kernel/
          • you should see something like this: scst_exec_req_fifo-2.6.38.patch. it should mostly match your kernel. if you do not see the patch file matching your kernel, there is 2 possibilities. the patch is now included as part of the kernel or, the kernel has not been tested with this release. it is safer to use a kernel listed there right now.
          • to patch it, type: patch -p1 < ../scst/scst/kernel/scst_exec_req_fifo-2.6.38.patch
            • it should return hunk #1,2,3,4 succeed

       

        • Kernel configuration is the hard part for most of us so, i will make it simple.
            • type lspci. take note of the devices listed there, scsi controller, network card, infiniband. thoses are the most important.
              • type: make menuconfig
                • Enable the Block Layer -> IO Scheduler -> CFQ
                • Processor Type and Features
                  • Preemption Model (No Forced Preemption (Server))
                  • Timer frequency (250 HZ)
                • Networking support -> Networking options -> TCP/IP -> TCP/IP zero-copy transfer completion notification -> [*]
                  • this is optional however, might help out in the event you decide to go iscsi instead of srp or dual stack srp + iscsi as failover
                • Device Driver -> Generic Driver Options ->
                  • [*] Maintain a devtmpfs filesystem to mount at /dev (this saves you to do the initramfs later on)
                  • [*] Automount devtmpfs at /dev, after the kernel mounted the rootfs
                • Device Driver -> SCSI device support ->
                  • [*]  SCSI disk support
                  • [*] SCSI low-level drivers
                    • select the raid card drivers here and set it as [*] or <*>
                      • if card is lsi, select all 4 LSI modules as [*] or <*>
                • Device Driver -> ATA/ATAPI/MFM/RLL support (DEPRECATED), uncheck it <>
                • Device Driver -> Serial ATA and Parallel ATA drivers
                  • if you are not using a raid card or use sata drives directly on the system board, go there and set it as <*>
                    • Select the drivers you are using and set it as <*> or [*]
                • if you are using a LSI or megaraid card, you must enable another option:
                  • Device Driver -> Fusion MPT device support . set them all as <*>
                • Are you using a lvm? if so, enable the options here: Multiple devices driver support (RAID and LVM) however, you will have to recreate an initrd which i will not describe,

                • now, we need to add the scst targets on the kernel. they should have shown under:
                  • Device Driver -> Generic Target Core Mod (TCM) and ConfigFS Infrastructure
                  • set them all as <M>
                • We also need to enable the infiniband stack in the event it is not enabld.
                  • Device Driver -> InfiniBand support <M>
                    • select everything under InfiniBand as <M> support except the IP-over-InfiniBand data path debugging as []
                • This is all for the quick configuration. compile and install.
                  • once installed, restart and if it went properly, you should be under the new kernel. we can finish from now on.
        • Does the switch has a subnet manager? if not, follow this
          • you must install and configure opensm. it is part of the OFED package.
          • I included a basic configuration example however, you will have to test the different options.

         

        • Installation and configuration of the SCST infiniband target
          • go under /usr/src/infiniband/scst/scst
            • we need to patch it for infiniband spport.
              • patch -p1 < ../srpt/patches/scst_increase_max_tgt_cmds.patch
            • now, just type: make, make install
          • go under /usr/src/infiniband/scst/scstadmin
            • make && make install
          • go under /usr/src/infiniband/srpt
            • make && make install

         

         

        • At thi point, we are ready to configure the drives. let's start by loading some modules. they have a specific order.
          • modprobe scst
          • modprobe scst_vdisk
          • modprobe ib_srpt
        • Now that we have the module loaded, we need to start configuring them.
          • type find /dev/disk/by-id
            • take note of the disks that you want to share.

         

        • go under (cd /sys/kernel/scst_tgt/handlers/vdisk_blockio)
          • type: cat mgmt
            • this will show you an example of configuration.
          • echo "add_device disk1 filename=/dev/disk/by-id/scsi-SAdaptec_diskname" > mgmt
            • repeat for all the devices that need to be added.
              • this will generate a configuration in memory which later on we will dump to the disk.
          • once all the disks are added, go under (cd /sys/kernel/scst_tgt/targets/ib_srpt/ib_srpt_target_0/luns)
            • type: cat mgmt
              • this will show you another help file
            • type: echo "add disk1 0" >  mgmt # The 0 means the lun number and MUST start with 0
            • type: echo "add disk2 1" >  mgmt.
          • once all disk and target are configured, we need to activate the configuration however, let's look at it now.
            • scstadmin -list_targets
            • scstadmin -list_device
            • scstadmin -list_group
            • scstadmin -list_session this one, use it often. and also, look at your dmesg command from time to time to see what is happening.
            • they should all look close enough to each other.
          • if everything is good for you, we will save the configuration and look at it inside a file.
            • scstadmin -write_config /etc/scst.conf
            • you might want to do a cat /etc/scst.conf at this time to see if everything is properly configured.
          • since we have not yet enabled the target to be visible, type: cd /sys/kernel/scst_tgt/targets/ib_srpt/ib_srpt_target_0
            • echo 1 > enabled
            • time to make another save.

         

          • i included a few configuration examples to ease the pain that this might create. it took me a week working on this, 17 hours a day and figuring it out with the poor instruction set i could barely find. no offence for the reader of this application. it might just be me

         

        Section 1 completed.

        ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

         

         

        Vmware configuration.

        • If you have the mellanox Connect-X HBA cards and use ESX/ESXI 4.0-4.1,
          • Please, download the driver from the mellanox website which is listed at the top of the document.
            • if for some reason, the driver does not work, use this one MEL-OFED-1.4.1-375-offline_bundle.zip.
              • The driver listed above is UNSUPPORTED as it is a BETA driver.

         

        • Once the driver is installed, you should see under storage adapter, a new device called: vmhba_mlx4_0.1.1. if the device is present, we are on a good track. however, we might need to reconfigure it to use the infiniband as it might not be detecting the connection properly.
        • logon on the console and configure the card to force infiniband connection.
            • type:esxcfg-module -s 'port_type_default=1' mlx4_en
            • restart the server and you should be able to see the luns within a minute or two
        • From now on, you should be able to create a new VMFS volume on the newly discovered luns. you might want to test the performance however, you MIGHT NOT be able to see it from the file server. mine is always idle even at a 700MB/s transfer speed.

         

        Except that, there is nohing to configure on the ESX server for the SRP disks. they are auto detect as per rfc's however, if you have issue with the detection, you might want to look at the file server dmesg

         

        you can also increase or decrease the verbose level by changing a valud in the files called: trace_level located under the multitude of directory of /sys/kernel/scst_tgt/. on the file server. You can do a cat of them before to see the options.

         

        As a side note, you might have more than 1 network card that showed-up due to the infiniband card. They should be working as is, i cannot guarantee that nor can i guarantee this document will work without a hitch however, i strongly recomment heavy testing before going in production

         

         

        If by any chance, the reader of this document know how to create functionnal VLAN's under an infiniband configuration, would you be kind enough to contact me? i have not yet been able to however, i cannot test as much as i want because thoses system are in productions

         

        Regards

        Patrick,

        i hope this will make life easier for a lot of prople trying to get infiniband working with ESXI 4.*