VMware Cloud Community
Sparticus13
Contributor
Contributor

40GB Infiniband for backend SAN to ESXi 5.1 Blades

Hi all,

I'm writing today to discuss my proposed SAN and network design for an upcoming project.  I'm looking for any suggestions and info in regards to this design.  The key technologies that I hope to incorperatate are:

Mellanox 40GB Infiniband (ConnectX2/3 Infiniscale IV/SwitchX 2)

Windows Storage Spaces

Windows Scale-Out File Server Cluster

RDMA

The SAN is going to be a 3 node Scale-Out File Server using Windows Server 2012 Storage Spaces with a DataOn DNS-1640D JBOD appliance.  Below is a link to the general setup.  SSDs will be used for the disks.

http://www.dataonstorage.com/images/PDF/Solutions/DLS/DataON_Microsoft_Server_2012_Storage_Space_HA_...

We will be using LSI 9207-8e HBAs to attach to the DNS-1640D.  We are looking to use a 40GB RDMA solution to create the 3 node Scale-Out File Server Cluster.  From there we would need to connect the SAN Cluster to our Blade Servers using a 40GB Switch.  We are currently using Supermicro TwinBlades model SBI-7227R-T2.  Currently Sumermicro offers a 40GB 4x QDR switch for the blade enclosure model SBM-IBS-Q3616M.  It is based on the InfiniScale IV silicon.  The blades can use their Mezzanine cards model AOC-IBH-XQD based on the ConnectX-2 silicon.  The plan is to connect the Supermico blade enclosure switch to larger 40GB Mellanox switch to sit between the SAN and Blades.

The SAN Cluster will be all physical systems with no virtualization while our Blades all run VMware ESXi 5.1 with Windows Server 2012 VMs.  I am not sure if  RDMA can be achieved between physical to virtual environments such as our SAN Cluster to Blades.  I also don't know much about 40GB stuff in VMware.  I know that VMware is working on a Paravirtual RDMA based solution but I don't think it is available yet.  I believe you can use the Pass-Thu method to a VM or SR-IOV to assign functions to VM, but I don't know much about implementing either of these and Pass-Thu would not be an option for our environment. 

We have been thinking of the SX6018 model switch form Mellanox to siit between the SAN and Blades.  As for the SAN adapters we were thinking of the MCX314A-BCBT or the MCX354A-FCBT.  I'm also not sure if these adapters and switch will work correctly with the Supermicro products as they are based on ConnectX-2 and InfiniScale IV instead of ConnectX3 and SwitchX-2 silicon.

If anyone has had any experience with Mellanox on VMware and or RDMA I would like to hear what they learned and what worked or didn't.  Any general suggestions and information that anyone has to share about my design or ideas would be great.  Also the SAN is not going to be used to store the VMs or VMDKs, it's just for other data that the VM guest OS's will access.

Thanks!

Chris

0 Kudos
11 Replies
RS_1
Enthusiast
Enthusiast

Hi Chris,

Infiniband sounds good and AFAIK is fully supported on vsp51 but why windows server on backend instead of something like ZFS ?

0 Kudos
Sparticus13
Contributor
Contributor

It's good to know Infiniband is supported in 5.1. All of our blade servers run Windows Server as that is the platform our web product is built on.  Recently we upgraded them all to 2012 from 2008R2.  With 2012 there are a host of new storage improvement's.  While ZFS may be a better underlying storage system than Storage Spaces is, we choose to use Windows Server for ths SAN to take advantage of its new SMB 3.0, SMB-Direct (RDMA over SMB), and Scale-Out file server features.  These features work very well with are blade Widows Servers and I think make the best choice for our environment.  I still not entirely sold on the Storage Spaces JBOD solution but I really do like the other features.

Plus MS labs did a test setup using the DataOn JBOD device and 3 Windows Server 2012 nodes with Mellanox adapters and got some pretty amazing IOPS and Bandwidth.  Currently the DataOn appliance is certified for Windows Storage Spaces and I believe is the only one that is at the moment.

Does anyone have any ideas about RDMA in Vsphere 5.1 and if it would play nicely with a physical to virtual route?

Thanks!

0 Kudos
RParker
Immortal
Immortal

Sparticus13 wrote:

It's good to know Infiniband is supported in 5.1. All of our blade servers run Windows Server as that is the platform our web product is built on.  Recently we upgraded them all to 2012 from 2008R2.  With 2012 there are a host of new storage improvement's.  While ZFS may be a better underlying storage system than Storage Spaces is, we choose to use Windows Server for ths SAN to take advantage of its new SMB 3.0, SMB-Direct (RDMA over SMB), and Scale-Out file server features.  These features work very well with are blade Widows Servers and I think make the best choice for our environment.  I still not entirely sold on the Storage Spaces JBOD solution but I really do like the other features.

Plus MS labs did a test setup using the DataOn JBOD device and 3 Windows Server 2012 nodes with Mellanox adapters and got some pretty amazing IOPS and Bandwidth.  Currently the DataOn appliance is certified for Windows Storage Spaces and I believe is the only one that is at the moment.

Does anyone have any ideas about RDMA in Vsphere 5.1 and if it would play nicely with a physical to virtual route?

Thanks!

Good information it's good to see some people are still able to put Microsoft in a good light instead of the old computer company from the 90s.

I think people severely underestimate Windows NTFS file system (which just received a bunch of new changes in 2008 / 2012) and it's not only viable, but one thing people FAIL to realize..

How big is Microsoft?  5,000, 10,000 try ..over 90,000 employees, ALL basically in one place, over 65 billion in revenue.. bottom line is NOT a small company..

What software do people think Microsoft uses?  Yeah WINDOWS!  NTFS is a GREAT file system, much better than Reiser, ZFS, or any other file system that Linux has to offer... but people don't give Microsoft any credibility.. why?  I still haven't figured that out.. I will take a guess that people ASSUME that since windows is generally and publically available it must be an off the shelf generic product.

NOT TRUE!

I work for a 80K plus employees company now.. EXCLUSIVELY Windows, they get along JUST fine... even create products and sell devices that uses Windows every day, so apparently there are many HUGE companies that recogize Windows and NTFS, and don't have issues.

I like to see people post things like this, to REMIND everyone there *IS* still the OS that's been around for almost 40 years...I don't think that's an accident.

0 Kudos
mcowger
Immortal
Immortal

Explain to me, exactly, how NTFS is better than every other filesystem, including ZFS, etc?

I mean, its good, yes, but it certainly lacks some nice capabilities that ZFS has (inline dedupe, just for one example).

--Matt VCDX #52 blog.cowger.us
0 Kudos
AKostur
Hot Shot
Hot Shot

I like to see people post things like this, to REMIND everyone there *IS* still the OS that's been around for almost 40 years...I don't think that's an accident.

Erm.. going somewhat offtopic, but Windows isn't even 30 years old (Windows 1.0 was released in 1985).  Even then, Windows 1.0 isn't really a reasonably direct ancestor to current Windows.  You'd have to go to Windows NT for that, and that didn't come around until 1993.  Which makes the current Windows barely 20 years old.  (And I could see some arguments that you might want to move the ancestry up to Windows 2000)

0 Kudos
Sparticus13
Contributor
Contributor

Hey all,

I'm hoping get more of a technical minded discussion about this topic and not so much of a back and forth about Linux vs Windows or other platform choices.  I am open to suggestions about the back end storage but would prefer not to see this turn into a battle so to speak Smiley Happy

At any rate I found a great post for info related to Widows Server 2012 using SMB 3 and RDMA with Mellanox products.

http://blogs.technet.com/b/josebda/archive/2013/01/03/updated-links-on-windows-server-2012-file-serv...

I found this link from the post particularly useful.

https://www.eiseverywhere.com/file_uploads/a7f56239742103ecf0c09c03d9145265_SNWFall2012_HighThroughp...

It shows three different 10/40GB RDMA solutions.  The Connectx-2 and 3 are shown here in test setups, one of which is using Stoarge Spaces JBOD just as I will be doing.

I'm still unsure on how ConnectX-2 and 3 will play together when mixed.  I've asked Supermicro if they have any plans to release an updated ConnectX-3 card for their blades and a newer SwitchX-2 switch and am waiting to hear back.  I also asked these question to Mellanox support and they FWDed it internal sales and am waiting to hear back from them as well.  I will post what I find out.

Just for kicks this forum follows someone using Supermicro Blades and the ConnectX-2 cards with a Solaris ZFS based SAN.  The blades were also on Linux.

http://hardforum.com/showthread.php?s=4b610d44185016e56c57c943de8023f7&t=1662769

Anyone ever mixed ConenctX-2 and 3 together?

Thanks!

0 Kudos
mlxali
Enthusiast
Enthusiast

Sparticus13,

Yes ConenctX-2 and 3 can be mixed.

More details on SMB and RDMA can be found under:
http://www.mellanox.com/page/file_storage

0 Kudos
tdatatilitytjk
Contributor
Contributor

Sparticus13,

Just curious how this turned out?  I've use IB/RDMA/SMB3 with Windows to Windows, but curious how you presented your storage spaces SAN to the VMware hosts?  NFS or iSCSI?

My understanding is that VMware will talk SRP to SAN's, however I don't know if Windows 2012 will present itself as such?

Any updates you can provide would be welcomed, I'm sure others are tracking this too!

Thanks,

Tom

0 Kudos
mlxali
Enthusiast
Enthusiast

To use RDMA for storage for ESX hosts, you have the following options:

1. File level storage:

1.1. SMB: it can be used only with Windows host that "sees" RDMA device, so you can install Windows VM with Pass-Through, and connected to Windows VMs using SMB. Currently Mellanox RDMA devices support Fixed Pass Through (FPT) where only one VM can claim the RDMA device, next release will support SRIOV where you can let multiple VMs have Pass-Through to the same RDMA device

1.2. NFS/iSCSI: supported today on top of IP over InfiniBand (IPoIB) driver, available today for VMware ESX 5.x:
http://www.mellanox.com/page/products_dyn?&product_family=36&mtag=vmware_drivers

2. Block level storage:

2.1 SRP for InfiniBand: it's support on ESX hosts (as a client) but you must have storage appliance that support SRP-target (SRP software target on Windows is not available today)

2.2. iSER for RDMA (can be Ethernet or InfiniBand): same as SRP.

0 Kudos
Sparticus13
Contributor
Contributor

Currently I have not implemented any of this yet.  I am still in the design stage of it.  Right now we have a single node SAN that uses hardware RAID with a single physical install of Windows Server 2012 on it.  So right now I am not using Storage Spaces to create my volumes, just physical RAID-10 with multiple partitions.  On the serving side we use Blade Servers running ESXi 5.1 all with Windows Server 2012 VMs on them.  How we access the data on the SAN depends right now.

If it is for our SQL servers (Backend Servers/Network) then they are accessing the SAN data via iSCSI with dedicated partitions/targets on the SAN.  SQL is using a two node Active/Passive failover Windows Cluster.  For the application servers (IIS/Frontend) they are accessing the SAN through Windows Network Shares SMB 3.0.  They are setup as Active/Active IIS nodes with a shared IIS config and central Network Share on the SAN.  There is a load balancer in front of them directing traffic to each node and monitoring health.

The future goal is to use all all SMB 3.0 for both SQL and APP servers connecting to a Multi Node Windows Server 2012 Cluster as the SAN.  This design will use a JBOD appliance with no hardware RAID.  It will connect to at least two physical Windows Server 2012 nodes using HBAs and the nodes will form a Windows Cluster using the new Scale Out Filer Server Role.  The JBOD will use Windows Storage Spaces to create a cluster wide share volume that each node has access to.

The goal will be to use RDMA with SMB 3.0 for both the SAN cluster nodes and the Blades.  I was aware that I could use VM with Pass-Through to do this but that limits to just the single VM so thats no good for me.  I was thinking SR-IOV could be an option and it looks like it will be in the next release from Mellanox.  I was unaware of the "NFS/iSCSI: supported today on top of IP over InfiniBand (IPoIB) driver, available today for VMware ESX 5.x:" option.  Still we would like to stop using iSCSI and go with SMB 3.0 RDMA directly.

I think the most promising solution to my design will be when the CTO of vmware finishes the new para-virtual RDMA driver/device to using in the VMs.  This should solve everything but it is not yet ready.

So I think I will start with just using normal IB networking w/o RDMA using all SMB 3.0.  I think I will be able to use RDMA SMB 3.0 on the SAN cluster as this is all physical Server 2012 systems with no virtualization.  Then I can do normal IB networking from the SAN to the Blades for now and upgrade to RDMA once the para-virtual RDMA is ready.

Thanks Mlxai for geting back to us on offical support and info regaurding the products and technology!

0 Kudos
HawkieMan
Enthusiast
Enthusiast

NTFS and ZFS in itself is not capable of anything except for aligning data and manage some metadata for the stored data. Dedupe is next level functions, and while inline dedupe is a nice feature in small and low range mid size enviroments, as soon as you go to big data its useless because of the performance hit. The later years microsoft have also introduced Dedupe, and have for a long time had Volume Shadow Services as a good alternative to snapshots. So to say it lacks features is wrong, its all about finding what you wish to use and what your price range is. I use ZFS based storage solutions as well as Windows 2012 R2 storage server at my home lab, and I actually like both systems. At work I handle Netapp storage where functionality is quite similar to ZFS, and with our amount of data the Netapp has to run dedupe as a scheduled task on files of specific age.

So comparing like this is like comparing apples and oranges. I like the simplified managment of the Windows Storage solutions, but I also like the endless possibilities in my ZFS based solutions. However when it comes to storing metadata and access information across server clusters and so on, the MS solution so far is better, but things chnage all the time.

0 Kudos