Setting up & Configuring EMC VNXe3150 iSCSI SAN Storage with High Availability

Setting up & Configuring EMC VNXe3150 iSCSI SAN Storage with High Availability

Finally, the EMC VNXe3150 got installed and configured and almost ready to start transition from old EMC AX4 to the new VNXe3150. In the initial stage, found a bit difficulties to get it configured correctly as every EMC documentation speaks about something else specially when it comes to iSCSI High Availability, they mix between NFS HA and iSCSI HA. In reality, both Storage HA of NFS and iSCSI differ from each other. Simply, NFS uses Link Aggregations with LAG/LACP and iSCSI is not.

Equipment

VNXe 3150 with dual storage processors with software version 2.4.0.20932
Two Dell Stackable 6224 - 24 port Gigabit switches configured with Jumbo frames for iSCSI and Flow Control Enabled.
One - ESXi VMware ESXi 5.1.0 build-799733 host with 6 Gig pNIC's two pNICs used for iSCSI connectivity only

Configuration on the VNXe

The configuration part is a bit of dilemma when it comes to iSCSI connectivity. In VNXe I set up two iSCSI Servers, one for Storage Processor A and one for Storage Processor B. Each SP has two IP Addresses configured for each Ethernet Interfaces, eth2 & eth3

iSCSI Server Settings-1.jpg

iSCSI Server Settings-2.jpg

iSCSI Server Settings-3.jpg

All the Ethernet Interfaces are configured with 9000 mtu size for Jumbo Frame;

MTU Config SPB.jpg

MTU Config SPB-eth3.jpg

Storage Resources Configuration

Storage Elements	iSCSI Server	Port	IP Address	MAC Address	pSwitch Port	VMKernel iSCSI PortGroup
iSCSI-A	iSCSI_ServerA	eth2	10.90.8.1	8:0:1b:57:71:3e	2/g1	iSCSI-01 10.90.8.78 vmk1
iSCSI-A	iSCSI_ServerA	eth3	10.100.8.1	8:0:1b:82:78:dd	1/g1	iSCSI-02 10.100.8.78 vmk2
iSCSI-B	iSCSI_ServerB	eth2	10.90.8.2	8:0:1b:58:59:0f	2/g2	iSCSI-01 10.90.8.78 vmk1
iSCSI-B	iSCSI_ServerB	eth3	10.100.8.2	8:0:1b:cd:f3:26	1/g2	iSCSI-02 10.100.8.78 vmk2

As you can see in the above screen shot and configuration table, each storage process have two Ethernet ports, each Ethernet port is connected to iSCSI pSwitch, eth2 in SPA matched eth2 in SPB. So since both of these interfaces are connected to the same pSwitch and same IP subnet is configured on both, single iSCSI VMkernel PortGroup on the same subnet can reach to both Storage Processors through single physical adapter “vmnic”

VNXe Connectivity Diagram

iSCSI Switches Configuration

Both network switches are configured as master and slave stackable switches, basicly for this type of configuration you will not require to stack the switches as each of pair SP Ethernet are connected to the same switch. i;e SPA-Eth2 in pSwitch1 & SPB-Eth2 in pSwitch1. But with NFS configuration you will require to stack the Switches as you will need to configure LAG/LACP for true High Availabity

Set up Jumbo Frames on the two iSCSI gig switches so that all the ports are using support Jumbo Frames. Below commands will let you configure all the ports with mtu size 9000

Console(config)#Interface range ethernet all

Console(config-if)#mtu 9000

ESXi host configuration

Each of the Gigs port in ESXi host is connected to the Physical iSCSI Switches. Two VMKernel PortGroups created in vSwitch1. Each of the iSCSI Kernels is mapped to the physical interface in ESXi.

Switch Name Num Ports Used Ports Configured Ports MTU Uplinks

vSwitch1 128 8 128 9000 vmnic2,vmnic6

PortGroup Name VLAN ID Used Ports Uplinks

iSCSI-02 0 1 vmnic6

iSCSI-01 0 1 vmnic2

vmk1 iSCSI-01 IPv4 10.90.8.78 255.255.255.0 10.90.8.255 00:50:56:6e:ea:87 9000 65535 true STA TIC

vmk2 iSCSI-02 IPv4 10.100.8.78 255.255.255.0 10.100.8.255 00:50:56:64:c0:6d 9000 65535

To check the connectivity is mapped correctly for each iSCSI PortGroup to reach the correct Ethernet Interface in the Storage Processor; vmkping with –I will allow you to specify the source interface to reach the destination iSCSI target; as this will test the whole path end-to-end from VNXe Storage, ESXi Hosts to Physical iSCSI Switches to make sure connectivity can flow with jumbo frame.

iSCSI vSwitch Ports.jpg

iSCSI Adapter Port Binding

Both iSCSI VMKernel portgroups has to be enabled for port bindings in the iSCSI Initiator adapter of ESXi.

iSCSI Port Bindings.jpg

Connectivity Results

~ # vmkping -I vmk1 10.90.8.1 -c 50 -s 9000

PING 10.90.8.1 (10.90.8.1): 9000 data bytes

9008 bytes from 10.90.8.1: icmp_seq=0 ttl=255 time=0.596 ms

9008 bytes from 10.90.8.1: icmp_seq=1 ttl=255 time=0.575 ms

9008 bytes from 10.90.8.1: icmp_seq=2 ttl=255 time=0.548 ms

--- 10.90.8.1 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.548/0.573/0.596 ms

~ # vmkping -I vmk1 10.90.8.2 -c 50 -s 9000

PING 10.90.8.2 (10.90.8.2): 9000 data bytes

9008 bytes from 10.90.8.2: icmp_seq=0 ttl=255 time=0.591 ms

9008 bytes from 10.90.8.2: icmp_seq=1 ttl=255 time=0.617 ms

9008 bytes from 10.90.8.2: icmp_seq=2 ttl=255 time=0.603 ms

~ # vmkping -I vmk2 10.100.8.1 -c 50 -s 9000

PING 10.100.8.1 (10.100.8.1): 9000 data bytes

9008 bytes from 10.100.8.1: icmp_seq=0 ttl=255 time=0.634 ms

9008 bytes from 10.100.8.1: icmp_seq=1 ttl=255 time=0.661 ms

9008 bytes from 10.100.8.1: icmp_seq=2 ttl=255 time=0.642 ms

--- 10.100.8.1 ping statistics ---

5 packets transmitted, 5 packets received, 0% packet loss

round-trip min/avg/max = 0.634/0.661/0.708 ms

~ # vmkping -I vmk2 10.100.8.2 -c 50 -s 9000

PING 10.100.8.2 (10.100.8.2): 9000 data bytes

9008 bytes from 10.100.8.2: icmp_seq=0 ttl=255 time=0.694 ms

9008 bytes from 10.100.8.2: icmp_seq=1 ttl=255 time=0.658 ms

9008 bytes from 10.100.8.2: icmp_seq=2 ttl=255 time=0.690 ms

Add ESXi hosts to VNXe

Setup the ESXi hosts to access VNXe iSCSI SAN Storage. This can be done by browsing into the VNXe > Hosts > VMWare will allow you to find ESX hosts either by typing in the IP Address of the vCenter or the management network of the ESXi itself. Then create VMFS datastore on the VNXe and make sure you are assigning permission to the ESXi host to access the newly created LUN.

After the LUN is presented to the ESXi host and formatted with VMFS now it’s time to change the path selection through from default Fixed to Round Robin and change the Round Robin default IOPS limit in ESXi from 1000 to 1 which will allow you to utilize all the iSCSI paths.

esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops 1 --device=naa.6006048c2fb691695617fc52e06065a2

Once it’s change you will see all the paths with Active(I/O) for each LUN that changed from Fixed to Round Robin.

Path Round Robin.jpg

Failover – Failback Testing Scenarios

For the failover testing I have presented 500 GB LUN and created two Virtual Machines, and installed Windows 2008 R2 Enterprise Edition. Roles installed on this guest machine;

Microsoft Active Directory Role Services
Microsoft DNS Server Services
Exchange Server 2010 with all the Roles.

The second Virtual Machines are a Windows 7 Professional client with Microsoft Outlook 2010 connected to the Exchange 2010 MAPI profile. Sending and receive emails internally is operational in normal mode.

Testing Networking

I have tested failover with network issues scenarios by disconnecting one pNic “vmnic2” from the vSwitch1 that mapped to iSCSI-01 portgroup and at the same time vmkping –I vmk1 was running against both VNXe iSCSI Target IP’s SPA-Eth1 “10.90.8.1” & SPB-Eth2 “10.90.8.2” and ping continues well. If a Storage Processor (SPA) fails/rebooted on VNXe, the working Storage Processor (SPB) picked up the workload that was handled by SPA.

Testing Networking-1.jpg

As you can see in the above screen shots, Virtual Machines Windows 7 Client continued pinging the Exchange Server and Exchange Server continued to ping Windows 7 Client.
Vmk1 = iSCSI-01 which mapped to vmnic2 stopped pinging to the SPA-Eth2 & SPB-Eth2.
LUN Paths both links mapped to vmnic2 subnet 10.90.8.x dead and 10.100.8.x paths mapped to vmnic6 ‘vmk2’ “iSCSI-02” are live and Active(I/O).

Path Round Robin-Failuer.jpg

Ping “vmnic6” via vmk2 to 10.100.8.1 & 10.100.8.2

# vmkping -I vmk2 10.100.8.1

PING 10.100.8.1 (10.100.8.1): 56 data bytes

64 bytes from 10.100.8.1: icmp_seq=0 ttl=255 time=0.229 ms

64 bytes from 10.100.8.1: icmp_seq=1 ttl=255 time=0.192 ms

64 bytes from 10.100.8.1: icmp_seq=2 ttl=255 time=0.238 ms

--- 10.100.8.1 ping statistics ---

3 packets transmitted, 3 packets received, 0% packet loss

round-trip min/avg/max = 0.192/0.220/0.238 ms

~ # vmkping -I vmk2 10.100.8.2

PING 10.100.8.2 (10.100.8.2): 56 data bytes

64 bytes from 10.100.8.2: icmp_seq=0 ttl=255 time=0.235 ms

64 bytes from 10.100.8.2: icmp_seq=1 ttl=255 time=0.245 ms

--- 10.100.8.2 ping statistics ---

2 packets transmitted, 2 packets received, 0% packet loss

round-trip min/avg/max = 0.235/0.240/0.245 ms

~ #

Relink vmnic2 to vSwitch1 and ping resumed back to SPA-Eth2 & SPB-Eth2

ReLink vmnic2 vSwitch1.jpg

LUN paths resumed

Testing Power Failure of VNXe Storage Processors-A & Storage Processor-B

The second test was done by removing the physical power from Storage Processor-B and initiate vmkping to both Ethernet Interface of SPB from both VMKernel vmk1 & vmk2, as a result vmkping continues as the traffic routed peer SP port.

Testing Power Failure.jpg

Result of ping after

VMs Ping Result while SP-B off.jpg

Below result shows that Exchange VM continues to ping the Client VM during Storage Processor-B shutdown.

VMs Ping Result while SP-B off.jpg

Did the same with Storage Processor-A and initiated ping to both Ethernet Interfaces of SPA. Ping continues to both Ethernet Interfaces as well as pings inside each VM “Exchange Server to Client” and vice versa continues as well and Exchange Server VM didn’t give any freeze / errors in event viewer.

Conclusion

The VNXe3150 high availability feature at storage level and networking level ensures data protection against any single component failure in Storage Level and Networking Level.

JustinPaul · ‎03-12-2013

Could you explain why all 4 paths are active ?

On my VNXe3100 i only see two paths which are both Active (I/O) but no paths from the non-Owning SP.

How did you get 4 paths while only using two ports on each SP ?

Also does the 3150 suffer from the same issue where an SP failure may mean that a lun owned by that SP is offline for a minute or 2 like the 3100 suffers from ?

habibalby · ‎03-13-2013

Hi Justin,

I don't have an explanation for it I noticed it too but haven't found anything. And I just connected back my Hp DL380 G6 server to same VNXe3150 noticed four paths with Fixed Policy. Where in my production Dell R710 servers with same configuration all I can see two paths.

Regarding the SP failure part, I haven't get single ping time out nor Exchange/DC server disruption. Both the VM Client and the Exchange Server on same VNXe3150 LUN continued pinging each other and the VMs console didn't disappear as well nor Exchange/DC gives errors in Event Viewer. Except, VMTools error in Application Event about times out I don't know if it's related to the SP failure or just like that it appears.

Thanks,

supertim82 · ‎03-18-2013

I'm just starting to get my VNXe3150 configured for my new vSphere setup. I'm new to VMware, vSphere and SAN storage. Your post was very helpful for me, thank you.

For what reason did you put eth2 and eth3 on different subnets on each of the SPs?

habibalby · ‎03-24-2013

Hi Supertim82,

Yes, I put Eth2 and Eth3 in SPA on different subnet, but Eth2 & Eth2 in SPB are same subnet that match SPA.

SPA-Eth2 10.90.8.1

SPB-Eth2 10.90.8.2

SPA-Eth3 10.100.8.1

SPB-Eth3 10.100.8.2

habibalby · ‎03-24-2013

Well, it seems EMC gives Contradictory information when it comes to HA and MPIO/Jumbo Frame configuration. In pure VMWare Native language, MPIO Means Multipathing, Jumbo Frame, Means MPIO along with Multipathing and Port Binding.

I have setup my VNXe3150 with two Subnet

SPA-Eth2 10.90.8.1

SPB-Eth2 10.90.8.2

SPA-Eth3 10.100.8.1

SPB-Eth3 10.100.8.2

Now, as per VMWare Support Port Bindings is not supported configuration when it comes to different broadcast domain. And VMware supports Port Binding only on single broadcast domain.

EMC with their HA documentation along with the domain made by Matthew Brender & Henri Hamalainen in the Ask the Expert of "VNXe Front-End Networks with VMWare" gives contradictory information both of the document speaks about HA with different subnets and Ask the Expert document clearly elaborating on the port binding, Round Robin and changing the default IOPS to gain high storage performance.

Now which is supported configuration, I don't and I left it to EMC to answer me on that.

Thanks,

habibalby · ‎03-24-2013

Hi Paul,

Still I haven't get a feedback regarding your queries, but definitely I will share it once I get anything.

Thanks,

habibalby · ‎03-25-2013

Hi Paul,

I have good news for your query, I did the same thing on the other host esx03, I removed both vmkernel portgroups from the Network Bindings, rebooted the host, put back the vmKernel Portgroups in the Network Bindings, rescan and it started showing four paths for each LUN.

It seems there is a bug related to some hardware vendor as I haven’t noticed this in the hp DL380 G6 server, where the Dell R710 shows this issue and the workaround for it as stated in the above.

Will follow the same steps on the last host esx01 and will update you shortly.

Thanks,

habibalby · ‎03-26-2013

Yes, did that on the last ESX host and now all the four paths are showing.

JoshRountree · ‎04-16-2013

This may be a dumb question, but I'm going to ask it anyway.

Is it assumed, or do you have to have a layer 3 switch in order to make this work?

habibalby · ‎04-18-2013

Hi Josh,

You don't have to have Layer 3 Switch nor you should have a switches connected/stacked together in order for this to work.

But if you are in an NFS Datastors and you want to have full redundancy, then you will have to enable the Aggregation between VNXe Ports as well as the pSwitches and the vSwitch.

Thanks

hennish · ‎07-11-2013

Hi. Thanks for the excellent write-up!

I think the tests would be more reliable if you used some kind of disk-based test instead of a simple network ping. Even if a VM loses disk connectivity, it will still be alive in memory and respond to pings, since that doesn't require disk access.

JoshuaNipa · ‎05-22-2015

Hi I have the exact SAN, I am just wondering, where you able to present the LUN to both SP's on the VNXe3150? I can only select an ISCSI server on either SP A or SP B. not both, it seems with your setup I am assuming that the reason 4 paths are showing up on your datastore is due to having that LUN presented to all SPs on the VNXe3150. Correct me if I am wrong.