VNXe 3100 Configuration for Redundancy with ESXi

I installed a new EMC VNXe 3100 principally for shared storage for ESXi. The documentation for the VNXe is lacking in this area, so here is how I got redundancy working.

Equipment

VNXe 3100 with dual storage processors and the optional two quad Gig port modules

Two - 16 port Gigabit switches with enough buffer support to handle Jumbo frames for iSCSI only

Three - ESXi 4.1 U1 hosts with at least 6 Gig NIC's

Configuration on the VNXe

I set up two aggregated links on the VNXe eth2 / eth3 and eth10/eth11. Set up Jumbo Frames on the two aggregated links. Only two sets of aggregated links are allowed on the VNXe 3100. Both Storage processors are set at once. Connect the eth2/eth3 on each Storage Processor to one iSCSI switch and the two sets of eth10/eth11 to the other switch. Each switch will be for a different IP network. You then set up two iSCSI servers one for each Storage processor. Set up a different IP address for each iSCSI server on the eth2/eth3 ports. Then go back and edit the iSCSI server and add a second IP address associated with eth10/eh11. You end up with two iSCSI servers with two IP addresses on two different networks.

Switch Configuration

Set up Jumbo Frames on the two iSCSI gig switches so that all the ports you are using support Jumbo Frames.

ESXi host configuration

Connect a Gig port from each ESXi host to each of the two iSCSI switches. Configure a kernel port on each of the two NICs on the ESXI with an appropriate IP address. Change the kernel and switch to mtu size of 9000 using the SSH connection to each ESXI.

esxcfg-vmknic -l

to list the kernels

esxcfg-vmknic -m 9000 <kernel name>

to change the kernel mtu

esxcfg-vmknic -l

to check to see that the change was made

esxcfg-vswitch -l

to list the switches

esxcfg-vswitch -m 9000 <switch name>

to set the switch to mtu of 9000

esxcfg-vswitch -l

to list the switches again to check the change.

vmkping -s 8000 <ip address of iSCSI server on VNXe)

This checks the whole path is working with jumbo frames (9000 will not work as a few bits are added making it over 9000)

Enable the iSCSI software adapter on each ESXi.

More Configuration on the VNXe

Set up the ESXi hosts on the VNXe. Then add a VMware VMFS data store on the VNXe. Make sure to give permission to each of the ESXi hosts to the new LUN. This will add the data store on the ESXi hosts as well.

Back to Vcenter

I then deleted the newly created VMFS data stores as they are created with a default block size of 1MB and I usually need a different block size.

Next you want to change the path selection via mange paths to Round Robin (VMware). You should see two paths for each LUN with Active(I/O) on each path.

I have tested failover with network issues and it works well. If a Storage Processor (SP) fails on the VNXe, the working SP it should pick up the iSCSI server from the failed one.

odge · ‎05-19-2011

Can I ask you the following:

1. What switches are you using?

2. What sort of ping times do you get from the ESX host to the:

a. switch IP

b. VNXe IP

odge · ‎05-19-2011

What would also be nice, is a IOmeter test result using one of the default IOMeter tests.

ddismore · ‎05-19-2011

Unfortunately we are a smaller company and I have this in Production at this point. So I am not able to run tests that could impact performance outside a maintence window.

odge · ‎05-19-2011

Perhaps you could indicate some applications performance that is running (like MS SQL) etc?

ddismore · ‎05-19-2011

Again this is a small company so my servers do not tax the capabilities of the VNXe. IOP rates and disk latency are highly related to the number and type of disks you have in the VNXe and associated to the LUN you are testing. My system is small and skewed toward capacity rather than performance. So my results may not provide a good indication of the performance of the VNXe in any case. As far as the design, I can tell you that I have seen even data rates between the two iSCSI NIC's on the ESXi and sustained read and write rates exceeding the capacity of a single NIC.

HoosierStorageG · ‎05-19-2011

Thanks for the helpful article. Just curious, how did you end up deciding to use iSCSI rather than the NFS to take advantage of de-dupe?

ddismore · ‎05-19-2011

My understanding is that the deduplication on the VNXe is file based deduplication, not block level deduplication. So deduplication would only help you on files directly on the NFS or CIFS share. I am also not sure that you could get the network redundancy and load sharing with NFS either.

HoosierStorageG · ‎05-19-2011

It is file-based de-dupe, and the de-dupe is single-instance storage so you won't get any savings from that since no two VMDK's will be identical. However, it also does some compression on the VM's which should yield some benefits (maybe 15-25%).....YMMV.

I'll have to go looking for it, but Chad Sakac did write up a pretty good article on how to configure networking with NFS. I do share some of the same concerns that link failover may not be quite as seamless as a block protocol failover. Also, I think there is still a limitation of not being able to aggregate multiple paths on the ESX side for NFS datastores.

heychris · ‎05-25-2011

This is very helpful, as we're looking at a very similar config. If HoosierStorage happens to find that article on NFS networking, that would be great, too.

We were also thinking about doing an NFS share for dedupe, based on the EMC rep's recommendation. Is it possible to do both file and block level on the same network configuration? We were thinking that we could just make a VLAN for our iSCSI traffic with no route, but also configure another subnet with a route to our network for the CIFS traffic.

HoosierStorageG · ‎05-25-2011

Here's a link that is some good reading about NFS performance on EMC - it references the post I was remembering from Chad Sakac as well. http://blog.scottlowe.org/2010/01/31/emc-celerra-optimizations-for-vmware-on-nfs/

With regard to your other question, you could have file and block on the same interface if it's a VNXe. If it's a VNX, then no. The file and block portions are served out of different network interfaces on the front-end, even if the back-end storage is shared between the two.

ddismore · ‎05-25-2011

HeyChris

I personally do not see any advantages to a NFS share for a VMware system with the VNXe. You will see no deduplication because the dedupe is only file level. Compression will only reduce performance so I would not recommend that either. We had thought the CIFS share would be of benefit as well but when you look at the additional cost to do virus scanning, and backup, we decided to just provided shares from a Windows server on a VM.

HoosierStorageG · ‎05-25-2011

Due to the way the compression functionality works, there is no hit on write performance. With read performance, there could be overhead of 10%. So, the question is going to be how much performance do your VM's need. If these are just light-usage boxes that sit at 5-25% utilized most of the day, then you probably won't even notice a change in performance and you can get the benefit of lower space usage from compression. Also, the compression can be disabled per VM. So, if you have 1 high-performance VM and 25 light-usage VM's, you can turn compression on for the 25 and leave it off for the high-performance VM. It's not "all or nothing".

jmainwaring · ‎05-28-2011

Hey Mate,

Thanks for the information, I also would be curious about what switches you used for this.

Are you using unmanaged/managed L2 switches? What model did you find had the best specifications?

Thanks.

odge · ‎05-28-2011

Hi ddismore,

The above configuration does work, however, give some thought to what is going to happen if a storage processor fails. (I have mine setup like you as well). The problem is, try failing over a storage processor, then what happens is the ports are moved over to the other processor, and your subnets for the ethernet pairs, end up reversed, and you actually dont have access to the right set of IP's on the right switch.

The only way I can think of getting around this would be to setup a bonded link between the two switches - and then hope that you never have both a switch failure and storage processor failure on the same day/time? Generally the bond between the switches wont be used unless a processor fails over.

Regards

Matt

Syberchip · ‎06-03-2011

Matt,

After putting the above into Visio, I'm trying to understand how the pairs would end up on the wrong subnet? I guess I need a better understanding as to how the SP ports move over during failure. I'm still very new to iSCSI and am trying to figure out the best approach for redundancy. We just got out new shiny VNXe last week!

Also, I'm curious as to why the aggrigate links? Was that done for performance?

Thanks!

ddismore · ‎06-03-2011

Lets say SPA fails, then the ip addresses on eth2/eth3 on SPA are placed on eth2/eth3 on SPB and the same is true for the ip addresses on eth10/eth11. Since eth2/eth3 ports from both SP's are on the same switch and same ip network, it works. eth10/eth11 on both SP's are on the other network and switch.

I used the aggregated links to add capacity for the iSCSI links. I also have the optional quad I/O modules, so I have 6 Gig ports per SP. Without that the quad ports, I would put eth2 and eth3 on the two different iSCSI networks from each SP.

I have tested the SP failover during software updates. Although I had little activity, my VM's did not notice anything. I have also tested switch failures by powering off a switch and no VM problems either.

As far as the switches I used two Cisco SG 300-20. I believe these are from the linksys line and are low end managed business switches. They share the buffer cache between all the ports on the switch. They are working for us but may not be the best fit for all applications.

Syberchip · ‎06-03-2011

Makes sense to me and definitely looks good in Visio. I'm probably going to run the extra cabling and prep for the future now, can't hurt. We're running two HP 2910al switches, all CAT 6 cabling, jumbo frames enabled. Thanks for posting this - helps quite a bit! Much appreciated!

ddismore · ‎06-03-2011

Some other thoughts. Upgrade to the latest code on the VNXe as there are some problems it addresses that hit me before the code was available. If you are using thin provisioning be aware that old VM's migrated to system, may need to have the unused area's of disk zeroed out or it may not be as thin as you would like.

odge · ‎06-05-2011

Sorry, ddismore is right, that if you puth eth2/3 from both SPs into the same switch, it does work as indicated. I have tested and noticed that to achieve a better throughoutput, (if you wanted to use a 4 port LACP trunk where possible), then you'll be using everything on the same switch. If that switch fails, you failover situation will be degraded quite severly - its quite noticable when I fail one switch compared to the other. I get much better performance out of having the two SPs in a cross over situation, but the redundancy is then lost for switch failover - so its pointless, but the test was interesting.

I personally think its a real bummer, that you cant bond the two base ports with one of the IO module ports, so that you can go, 3 & 3 ports. This would be better "balancing" for MPIO setups as well.

odge · ‎06-13-2011

Hey, off the topic of this, but I asked EMC if there are any issues having the AC power connected to different PDUs which are one different phase grids. (Refer to an article like this: http://searchdatacenter.techtarget.com/answer/Dual-power-supplies-and-power-grids ) like we do on the actual servers - which I know dont have issues with it. But their reply was unclear, their techinical support guys aren't sure themselves.

Is yours wired up to different phases power on the AC side?

Bulldog02 · ‎12-07-2011

So what happens if a link/port fails? For instance, if the port on the VNXe for eth2 on SPA goes 'bad' and eth2 was aggregated with eth3, what's the result?? That seems to be a big question that EMC doesn't have in documentation. Is it just degraded performance on that path to SPA? Or does everything failover to eth2/eth3 on SPB?

EDITED: So, I found a pretty good EMC doc talking about this... To add, in theory, what is descirbed above will work and only communicate down one link in the aggregation if the other fails. BUT, EMC descibes this in the NFS section of the document. Is this the same for iSCSI?? Is there a difference in how the underlying network works between both protocols?

https://community.emc.com/docs/DOC-12551 Must have a PowerLink account to access it.

ddismore · ‎12-07-2011

Bulldog

I am sorry did not test that specifically, but I did unplug one of the two connection to the switch and the connection still worked through the remaining link.

All

VNXe 3100 Configuration for Redundancy with ESXi

VNXe 3100 Configuration for Redundancy with ESXi