Setting up & Configuring EMC VNXe3150 iSCSI SAN Storage with High Availability
Finally, the EMC VNXe3150 got installed and configured and almost ready to start transition from old EMC AX4 to the new VNXe3150. In the initial stage, found a bit difficulties to get it configured correctly as every EMC documentation speaks about something else specially when it comes to iSCSI High Availability, they mix between NFS HA and iSCSI HA. In reality, both Storage HA of NFS and iSCSI differ from each other. Simply, NFS uses Link Aggregations with LAG/LACP and iSCSI is not.
Equipment
Configuration on the VNXe
The configuration part is a bit of dilemma when it comes to iSCSI connectivity. In VNXe I set up two iSCSI Servers, one for Storage Processor A and one for Storage Processor B. Each SP has two IP Addresses configured for each Ethernet Interfaces, eth2 & eth3
All the Ethernet Interfaces are configured with 9000 mtu size for Jumbo Frame;
Storage Resources Configuration
Storage Elements | iSCSI Server | Port | IP Address | MAC Address | pSwitch Port | VMKernel iSCSI PortGroup |
iSCSI-A | iSCSI_ServerA | eth2 | 10.90.8.1 | 8:0:1b:57:71:3e | 2/g1 | iSCSI-01 10.90.8.78 vmk1 |
eth3 | 10.100.8.1 | 8:0:1b:82:78:dd | 1/g1 | iSCSI-02 10.100.8.78 vmk2 | ||
iSCSI-B | iSCSI_ServerB | eth2 | 10.90.8.2 | 8:0:1b:58:59:0f | 2/g2 | iSCSI-01 10.90.8.78 vmk1 |
eth3 | 10.100.8.2 | 8:0:1b:cd:f3:26 | 1/g2 | iSCSI-02 10.100.8.78 vmk2 |
As you can see in the above screen shot and configuration table, each storage process have two Ethernet ports, each Ethernet port is connected to iSCSI pSwitch, eth2 in SPA matched eth2 in SPB. So since both of these interfaces are connected to the same pSwitch and same IP subnet is configured on both, single iSCSI VMkernel PortGroup on the same subnet can reach to both Storage Processors through single physical adapter “vmnic”
VNXe Connectivity Diagram
iSCSI Switches Configuration
Both network switches are configured as master and slave stackable switches, basicly for this type of configuration you will not require to stack the switches as each of pair SP Ethernet are connected to the same switch. i;e SPA-Eth2 in pSwitch1 & SPB-Eth2 in pSwitch1. But with NFS configuration you will require to stack the Switches as you will need to configure LAG/LACP for true High Availabity
Set up Jumbo Frames on the two iSCSI gig switches so that all the ports are using support Jumbo Frames. Below commands will let you configure all the ports with mtu size 9000
Console(config)#Interface range ethernet all
Console(config-if)#mtu 9000
ESXi host configuration
Each of the Gigs port in ESXi host is connected to the Physical iSCSI Switches. Two VMKernel PortGroups created in vSwitch1. Each of the iSCSI Kernels is mapped to the physical interface in ESXi.
Switch Name Num Ports Used Ports Configured Ports MTU Uplinks
vSwitch1 128 8 128 9000 vmnic2,vmnic6
PortGroup Name VLAN ID Used Ports Uplinks
iSCSI-02 0 1 vmnic6
iSCSI-01 0 1 vmnic2
vmk1 iSCSI-01 IPv4 10.90.8.78 255.255.255.0 10.90.8.255 00:50:56:6e:ea:87 9000 65535 true STA TIC
vmk2 iSCSI-02 IPv4 10.100.8.78 255.255.255.0 10.100.8.255 00:50:56:64:c0:6d 9000 65535
To check the connectivity is mapped correctly for each iSCSI PortGroup to reach the correct Ethernet Interface in the Storage Processor; vmkping with –I will allow you to specify the source interface to reach the destination iSCSI target; as this will test the whole path end-to-end from VNXe Storage, ESXi Hosts to Physical iSCSI Switches to make sure connectivity can flow with jumbo frame.
iSCSI Adapter Port Binding
Both iSCSI VMKernel portgroups has to be enabled for port bindings in the iSCSI Initiator adapter of ESXi.
Connectivity Results
~ # vmkping -I vmk1 10.90.8.1 -c 50 -s 9000
PING 10.90.8.1 (10.90.8.1): 9000 data bytes
9008 bytes from 10.90.8.1: icmp_seq=0 ttl=255 time=0.596 ms
9008 bytes from 10.90.8.1: icmp_seq=1 ttl=255 time=0.575 ms
9008 bytes from 10.90.8.1: icmp_seq=2 ttl=255 time=0.548 ms
--- 10.90.8.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.548/0.573/0.596 ms
~ # vmkping -I vmk1 10.90.8.2 -c 50 -s 9000
PING 10.90.8.2 (10.90.8.2): 9000 data bytes
9008 bytes from 10.90.8.2: icmp_seq=0 ttl=255 time=0.591 ms
9008 bytes from 10.90.8.2: icmp_seq=1 ttl=255 time=0.617 ms
9008 bytes from 10.90.8.2: icmp_seq=2 ttl=255 time=0.603 ms
~ # vmkping -I vmk2 10.100.8.1 -c 50 -s 9000
PING 10.100.8.1 (10.100.8.1): 9000 data bytes
9008 bytes from 10.100.8.1: icmp_seq=0 ttl=255 time=0.634 ms
9008 bytes from 10.100.8.1: icmp_seq=1 ttl=255 time=0.661 ms
9008 bytes from 10.100.8.1: icmp_seq=2 ttl=255 time=0.642 ms
--- 10.100.8.1 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 0.634/0.661/0.708 ms
~ # vmkping -I vmk2 10.100.8.2 -c 50 -s 9000
PING 10.100.8.2 (10.100.8.2): 9000 data bytes
9008 bytes from 10.100.8.2: icmp_seq=0 ttl=255 time=0.694 ms
9008 bytes from 10.100.8.2: icmp_seq=1 ttl=255 time=0.658 ms
9008 bytes from 10.100.8.2: icmp_seq=2 ttl=255 time=0.690 ms
Add ESXi hosts to VNXe
Setup the ESXi hosts to access VNXe iSCSI SAN Storage. This can be done by browsing into the VNXe > Hosts > VMWare will allow you to find ESX hosts either by typing in the IP Address of the vCenter or the management network of the ESXi itself. Then create VMFS datastore on the VNXe and make sure you are assigning permission to the ESXi host to access the newly created LUN.
After the LUN is presented to the ESXi host and formatted with VMFS now it’s time to change the path selection through from default Fixed to Round Robin and change the Round Robin default IOPS limit in ESXi from 1000 to 1 which will allow you to utilize all the iSCSI paths.
esxcli storage nmp psp roundrobin deviceconfig set --type=iops --iops 1 --device=naa.6006048c2fb691695617fc52e06065a2
Once it’s change you will see all the paths with Active(I/O) for each LUN that changed from Fixed to Round Robin.
Failover – Failback Testing Scenarios
For the failover testing I have presented 500 GB LUN and created two Virtual Machines, and installed Windows 2008 R2 Enterprise Edition. Roles installed on this guest machine;
The second Virtual Machines are a Windows 7 Professional client with Microsoft Outlook 2010 connected to the Exchange 2010 MAPI profile. Sending and receive emails internally is operational in normal mode.
Testing Networking
I have tested failover with network issues scenarios by disconnecting one pNic “vmnic2” from the vSwitch1 that mapped to iSCSI-01 portgroup and at the same time vmkping –I vmk1 was running against both VNXe iSCSI Target IP’s SPA-Eth1 “10.90.8.1” & SPB-Eth2 “10.90.8.2” and ping continues well. If a Storage Processor (SPA) fails/rebooted on VNXe, the working Storage Processor (SPB) picked up the workload that was handled by SPA.
Ping “vmnic6” via vmk2 to 10.100.8.1 & 10.100.8.2
# vmkping -I vmk2 10.100.8.1
PING 10.100.8.1 (10.100.8.1): 56 data bytes
64 bytes from 10.100.8.1: icmp_seq=0 ttl=255 time=0.229 ms
64 bytes from 10.100.8.1: icmp_seq=1 ttl=255 time=0.192 ms
64 bytes from 10.100.8.1: icmp_seq=2 ttl=255 time=0.238 ms
--- 10.100.8.1 ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 0.192/0.220/0.238 ms
~ # vmkping -I vmk2 10.100.8.2
PING 10.100.8.2 (10.100.8.2): 56 data bytes
64 bytes from 10.100.8.2: icmp_seq=0 ttl=255 time=0.235 ms
64 bytes from 10.100.8.2: icmp_seq=1 ttl=255 time=0.245 ms
--- 10.100.8.2 ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.235/0.240/0.245 ms
~ #
Relink vmnic2 to vSwitch1 and ping resumed back to SPA-Eth2 & SPB-Eth2
LUN paths resumed
Testing Power Failure of VNXe Storage Processors-A & Storage Processor-B
The second test was done by removing the physical power from Storage Processor-B and initiate vmkping to both Ethernet Interface of SPB from both VMKernel vmk1 & vmk2, as a result vmkping continues as the traffic routed peer SP port.
Result of ping after
Below result shows that Exchange VM continues to ping the Client VM during Storage Processor-B shutdown.
Did the same with Storage Processor-A and initiated ping to both Ethernet Interfaces of SPA. Ping continues to both Ethernet Interfaces as well as pings inside each VM “Exchange Server to Client” and vice versa continues as well and Exchange Server VM didn’t give any freeze / errors in event viewer.
Conclusion
The VNXe3150 high availability feature at storage level and networking level ensures data protection against any single component failure in Storage Level and Networking Level.
Could you explain why all 4 paths are active ?
On my VNXe3100 i only see two paths which are both Active (I/O) but no paths from the non-Owning SP.
How did you get 4 paths while only using two ports on each SP ?
Also does the 3150 suffer from the same issue where an SP failure may mean that a lun owned by that SP is offline for a minute or 2 like the 3100 suffers from ?
Hi Justin,
I don't have an explanation for it I noticed it too but haven't found anything. And I just connected back my Hp DL380 G6 server to same VNXe3150 noticed four paths with Fixed Policy. Where in my production Dell R710 servers with same configuration all I can see two paths.
Regarding the SP failure part, I haven't get single ping time out nor Exchange/DC server disruption. Both the VM Client and the Exchange Server on same VNXe3150 LUN continued pinging each other and the VMs console didn't disappear as well nor Exchange/DC gives errors in Event Viewer. Except, VMTools error in Application Event about times out I don't know if it's related to the SP failure or just like that it appears.
Thanks,
I'm just starting to get my VNXe3150 configured for my new vSphere setup. I'm new to VMware, vSphere and SAN storage. Your post was very helpful for me, thank you.
For what reason did you put eth2 and eth3 on different subnets on each of the SPs?
Hi Supertim82,
Yes, I put Eth2 and Eth3 in SPA on different subnet, but Eth2 & Eth2 in SPB are same subnet that match SPA.
SPA-Eth2 10.90.8.1
SPB-Eth2 10.90.8.2
SPA-Eth3 10.100.8.1
SPB-Eth3 10.100.8.2
Well, it seems EMC gives Contradictory information when it comes to HA and MPIO/Jumbo Frame configuration. In pure VMWare Native language, MPIO Means Multipathing, Jumbo Frame, Means MPIO along with Multipathing and Port Binding.
I have setup my VNXe3150 with two Subnet
SPA-Eth2 10.90.8.1
SPB-Eth2 10.90.8.2
SPA-Eth3 10.100.8.1
SPB-Eth3 10.100.8.2
Now, as per VMWare Support Port Bindings is not supported configuration when it comes to different broadcast domain. And VMware supports Port Binding only on single broadcast domain.
EMC with their HA documentation along with the domain made by Matthew Brender & Henri Hamalainen in the Ask the Expert of "VNXe Front-End Networks with VMWare" gives contradictory information both of the document speaks about HA with different subnets and Ask the Expert document clearly elaborating on the port binding, Round Robin and changing the default IOPS to gain high storage performance.
Now which is supported configuration, I don't and I left it to EMC to answer me on that.
Thanks,
Hi Paul,
Still I haven't get a feedback regarding your queries, but definitely I will share it once I get anything.
Thanks,
Hi Paul,
I have good news for your query, I did the same thing on the other host esx03, I removed both vmkernel portgroups from the Network Bindings, rebooted the host, put back the vmKernel Portgroups in the Network Bindings, rescan and it started showing four paths for each LUN.
It seems there is a bug related to some hardware vendor as I haven’t noticed this in the hp DL380 G6 server, where the Dell R710 shows this issue and the workaround for it as stated in the above.
Will follow the same steps on the last host esx01 and will update you shortly.
Thanks,
Yes, did that on the last ESX host and now all the four paths are showing.
This may be a dumb question, but I'm going to ask it anyway.
Is it assumed, or do you have to have a layer 3 switch in order to make this work?
Hi Josh,
You don't have to have Layer 3 Switch nor you should have a switches connected/stacked together in order for this to work.
But if you are in an NFS Datastors and you want to have full redundancy, then you will have to enable the Aggregation between VNXe Ports as well as the pSwitches and the vSwitch.
Thanks
Hi. Thanks for the excellent write-up!
I think the tests would be more reliable if you used some kind of disk-based test instead of a simple network ping. Even if a VM loses disk connectivity, it will still be alive in memory and respond to pings, since that doesn't require disk access.
Hi I have the exact SAN, I am just wondering, where you able to present the LUN to both SP's on the VNXe3150? I can only select an ISCSI server on either SP A or SP B. not both, it seems with your setup I am assuming that the reason 4 paths are showing up on your datastore is due to having that LUN presented to all SPs on the VNXe3150. Correct me if I am wrong.