iSCSI Port Binding Question

virtualinstall · ‎02-19-2013

Hi,

Could do with a little help getting my head round the following:

I had a problem with a newly configured ESXi 5.1 host, at reboot it hung on vmw_satp_lsi successfully loaded. I linked this to http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=201708... and the fact I had iSCSI port binding in place, but I’m not so sure. Host and SAN configured as follows:

ESXi host:

vSwitch1

vmk1 IP: 192.168.132.105 (Port Binding vmnic 1)

vmk2 IP: 192.168.133.105 (Port Binding vmnic 2)

Storage array:

Controller 0/1: 192.168.130.100

Controller 0/2: 192.168.131.100

Controller 0/3: 192.168.132.100

Controller 0/4: 192.168.133.100

Controller 1/1: 192.168.130.101

Controller 1/1: 192.168.131.101

Controller 1/1: 192.168.132.101

Controller 1/1: 192.168.133.101

The Dell MD3220i documentation is what I’m finding a little unclear regarding this, as below:

"In a configuration assign one VMkernel port for each physical NIC in the system. So if there are 3 NICs, assign 3 VMkernel Ports. This is referred to in VMware’s iSCSI SAN Configuration Guide as 1:1 port binding.

- Note: Port binding requires that all target ports of the storage array must reside on the same broadcast domain as the VMkernel ports because routing is not supported with port binding. See VMware KB #2017084 here. "

So the above advises on 1:1 port binding for multiple NICs then notes that they must be on the same broadcast domain?

Any help on understanding this appreciated.

Thanks

kermic · ‎02-19-2013

Currently it looks like you have iSCSI ports on 4 subnets.

The keywords here are "routing is not supported with port binding".

By "same broadcast domain" here we mean that all bound vmk ports and all storage ports are on same VLAN and same IP subnet so that they can talk to each other without a need to go through router / gateway.

In your case array probably advertises all ports to host, during boot host tries to access i.e. storage port 0/1 from vmk1, since they are on different IP subnets, host sends the request to GW and appears to be hung while waiting for response from array. Same is repeated for remaining storage port to vmk combinations.

Why did you decide to configure multiple subnets for iSCSI?

WBR

Imants

virtualinstall · ‎02-19-2013

Thanks for the reply, Dell initially set up SAN using the this configuration. I think you're right in that the all ports are advertised but the ESXi won't be able to access them. What are my options for this from the ESXi side?

Gkeerthy · ‎02-19-2013

first of all iscsi dont need any separate vlans for each iscsi target ports, that is a single vlan is enough.

Here you have total 8 ports in the iscsi storage side, and you need to have 8 pnics and 8 vmkernel ports in the esx side, then only you will get the full bandwidth. that is there should be an end to end mapping, that is one esxi vmkernek ip to ine storage target ip.

else you need to use etherchannel or LAG

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

kermic · ‎02-19-2013

Is the array already in production?

What was the reason for configuring each array port in different subnet? Are there any requirements behind this or was it done "just because one felt doing so"?

If there are no specific reasons for multiple subnets, nothing / noone is currently accessing the array and you have the power to make decisions / reconfigure array, I'd suggest putting all array ports in same subnet and same VLAN. Then reconfigure your vmkernel ports to have IPs from same subnet / reside on same VLAN and do a rescan.

WBR

Imants

kermic · ‎02-19-2013

Gopinath Keerthyrajan wrote:
Here you have total 8 ports in the iscsi storage side, and you need to have 8 pnics and 8 vmkernel ports in the esx side, then only you will get the full bandwidth. that is there should be an end to end mapping, that is one esxi vmkernek ip to ine storage target ip.
else you need to use etherchannel or LAG

I strongly disagree with this one. This will be true only if:

there is a single host accessing array

there are at least 8 luns on array, all accessed at full throttle at the same time, every LUN via different NIC (one active path per LUN at a time, unless you're using 3rd party multipathing plugin)

You will be able to access iSCSI array even if you have just a single RJ45 port on your host. Typically people use 2 for availability.

WBR

Imants

virtualinstall · ‎02-19-2013

Can this be resolved by only using Static Discovery rather than Dynamic, so the unreachable iSCSI targets are not listed? That would reduce the number of targets from 8 to 4 that the host is aware of, but possibly still cause problems because all vmk ports could not reach all targets?

Or configure the host to use 2 pnics with either one or two vmk ports on the same subnet, then use static discovery. I would then have all vmks ports on the same broadcast domain as the iscsi target that the host could see?

Would leave me with two pnics wityh 1:1 port binding with two vmk ports on same subnet being able to access one port off each controller on the SAN in the same subnet.

virtualinstall · ‎02-19-2013

In effect it would look like this:

ESXi host:

vSwitch1

vmk1 IP: 192.168.132.105 (Port Binding vmnic 1)

vmk2 IP: 192.168.132.106 (Port Binding vmnic 2)

Storage array:

Controller 0/1: 192.168.130.100

Controller 0/2: 192.168.131.100

Controller 0/3: 192.168.132.100 (Static Discovery from ESXi host)

Controller 0/4: 192.168.133.100

Controller 1/1: 192.168.130.101

Controller 1/1: 192.168.131.101

Controller 1/1: 192.168.132.101 (Static Discovery from ESXi host)

Controller 1/1: 192.168.133.101

kermic · ‎02-19-2013

Yes, static discovery could do the trick. But still, if you're doing port binding then all vmkernel ports used by SW iSCSI initiator need to be on same subnet which leaves you on using single port per controller on array side if you can't change the array config.

And make sure that you remove all targets from Dynamic Discovery tab. After that you'll need to reset the established iSCSI sessions which can be done either via vCLI (esxcli iscsi session context) or by rebooting the host.

Your storage vendor should have some sort of whitepaper on best practices for configuring the array for use with ESXi software iSCSI initiator. This might be the most accurate information source including multipathing policy configurations and recommended firmware levels.

WBR

Imants

Gkeerthy · ‎02-19-2013

kermic wrote:
I strongly disagree with this one. This will be true only if:
there is a single host accessing array
there are at least 8 luns on array, all accessed at full throttle at the same time, every LUN via different NIC (one active path per LUN at a time, unless you're using 3rd party multipathing plugin)
You will be able to access iSCSI array even if you have just a single RJ45 port on your host. Typically people use 2 for availability.
WBR
Imants

hi Kermic - I wont agree with your comments.

first of all the paths are managed by the esx storage stack and not by the network stack, and the esx will use only one pNIC to send/receive traffic this is the esx architecture. So now in the DELL there is 8 ports total that is 4 ports per conroller, so whether you have single host or multiple host, sinle lun of multiple lun, we need to have multiple paths. For best results, here there are 8 iscsi targets, so you can have 8 paths. The PSP plugin will use each path after 1000 IOPS, and there is no need for 3 party plugings, off course you can use others also, the esx PSP is great.

Now you have 8 paths, and if the esx have only 4 or 2 nics, still you can have 8 paths, but the entire traffic will have to be pass through these 2 or 4 nics,That is why if you use more nics for the iscsi, it is better, the ethernet layer will easily get congested. But in the FC case even with the 2gig dual hba there wont be any bandwidth congestion, when compared to ethernet.

here what he can do is, he can create a etherchannel/LAG between storage and the ISCSI switch, that is combine 4 storage ethernet ports and give to the ISCSI switch and give one ip each. From the esx side, he can use 2 pnics and 2 vmkernel port groups, this will be better than the current situation. Still there will be bottle neck in the esx side, because of the less nic.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

Gkeerthy · ‎02-19-2013

if you are using ISCSI HBA or ISCSI offload in the pnic then only you can have the static discovery, if you are using software iscsi ininitiator then you can only have dynamic discovery, let me know which one is your case.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

virtualinstall · ‎02-19-2013

kermic - thanks have been reading the whitepaper from Dell on the MD3220i and ESXi 5.0

Gkeerthy - I didn't know that was the case with static and dynamic discovery , I was planning to use the software ISCSI adaptor but have the option of using dependant HBA as well Broadcom BCM5709

Gkeerthy · ‎02-19-2013

if you have iscsi offload try to use them, also check the performance of the software and hardware initiators.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

Gkeerthy · ‎02-19-2013

http://i.dell.com/sites/content/shared-content/data-sheets/en/Documents/PowerVault_MD_iSCSI_Deployme...

http://www.dell.com/downloads/global/products/pvaul/en/md-family-deployment-guide.pdf

http://www.dell.com/Learn/us/en/04/shared-content~data-sheets~en/Documents~VMware_ESX_4-1_Deployment...

refer the dell guides for the best practices.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

kermic · ‎02-19-2013

I'm not saying you're completely wrong.

However noone here mentioned issues related to link saturation, yet

Theoretically array with 8x1Gbps ports can produce 8Gbps throughput. Agreed?

Would you buy an 8port 1Gbps iSCSI array if you're designing for this type of load? How many spindles would you need for that?

Your'e talking about RR PSP. How many paths per LUN are active (serving I/Os) at a time? (hint: number is the same for RR, MRU and Fixed). What is the bandwidth of a single path if I'm having 1Gbps ports on host? Will it change if I'm having 2 or 8 ports on host?

Yes, link aggregation might help if there is a link saturation between the array and switch (multiple hosts producing load) and if that is supported by the array, but that's a few steps forward. First of all we need to connect to that array and if I'm not mistaken, currently can't change IPs on array ports.

Just out of curiosity, have you seen a lot of environments where 1Gbps iSCSI bandwidth is the bottleneck? From my experience the first thing that customers usually hit is the number of IOs that array can produce (first of all # of spindles and then CPU on array). Throughput is rarely the case. When it is, it's either something very specific (i.e. public streaming video servers) or it's a system that due to normal evolution is about to migrate to higher storage tier (FC or 10Gbps). Otherwise what I'm seeing typically is that if we place ~20-30 average / regular VMs on a single host, those 1Gbps iSCSI links are rarely loaded above 30% throughout the day. Utilization only increases during backup window.

Again, it depends heavily on the case and on VM IO profiles. This is what I see every day. Maybe I'm living under a rock so I'd be happy to hear about your experience.

WBR

Imants

virtualinstall · ‎02-19-2013

Thanks jsut re-reading them all again as I'm starting to get a better understaning, one point in the Dell Deployment Guide is

"Configuring hardware initiators from within VMware® ESXi5.0 Server™ is not supported on the MD platform"

Another point I've just read from http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=102564... is

Additional recommendations when configuring software iSCSI initiator on an ESX/ESXi host:

Use two or more dedicated network cards for iSCSI.
These network cards should be configured on the same network segment as the storage processors on your iSCSI array to avoid routing.
If possible connect the iSCSI network cards to different physical switches for additional redundancy.
VMware recommends using port binding (also known as a 1:1 mapping) to ensure iSCSI paths can be correctly multipathed by the host. This configuration is covered in the iSCSI SAN Configuration guides.
If you are using different networks for your iSCSI vmkernel ports then they need to be on separate vSwitches

I'm setting up a new test host now to get this worked out and all the above posts have been helpful, thanks.

kermic · ‎02-19-2013

Gopinath Keerthyrajan wrote:
if you are using ISCSI HBA or ISCSI offload in the pnic then only you can have the static discovery, if you are using software iscsi ininitiator then you can only have dynamic discovery, let me know which one is your case.

Now I'm being picky

Has anything drastically changed in vmkernel within last few days?

Any release notes you could refer to?

WBR

Imants

Gkeerthy · ‎02-19-2013

if it is a regular vms..ok..i agree, but if it is a prodcution vms with heavy load and if you will get bottleneck. more over, why we need to limit, if the hardware is capable, we need to use. I know here he didnt speak congestion, why cant we do things in correct manner

just like in the Netapp or any iscsi, it is good to atleast do a aggregation from storage to switch.

Then as you know, the ethernet medium is very sensitive to the bandwidth usage. The iscsi will be great, if the medium is less utilized, if the nic utilization is there or high or it reaches 70 then the TCP congestion control algoritham will start to work, then in result you may experience latency. Thats why in iscsi based storage, we need to increase the iscsi time outs and other optimizations etc just like the netapp.

So in short, the tranport medium is copper cable so at any point during heavy load congestion will appear. This is my understanding, my be i am wrong

the bottom line is why we cant do it in a better way, so at any point of time the infrastructure can accomodate any load or any vm with a high IO profile, i faced many issues with these, people will use few nics and put lot of vms, and then finally add more nics. Its all depend upon the design and the requirement.

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

virtualinstall · ‎02-19-2013

From the Dell white paper on MD_iSCSI_Deploymnet:

Considerations involving subnet configuration

It is recommended that you have different ports on different subnets due to throughput and pathing considerations. The decision to use multiple subnets may depend upon the specific architecture and networking topology and may not be essential in all cases. On the MD3200i/MD3600i, setup each port on a different subnet with one disk group per controller, each disk group with one LUN per port.

Multiple subnets should be allocated by the number of array ports per controller. With the MD3200i you only get an active path to multiple ports on the same controller if they are on different subnets or VLANs. Since the MD3220i has four ports per controller you get your best throughput with four subnets.

On an MD3200i/MD3600i, the LUNs are assigned out of disk groups. Only one controller at a time has access to the LUNs. The alternate controller has access to those LUNs but only as a failover alternate path. This LUN/controller relationship is reported by the MD3200i/MD3600i to VMware which then dictates the paths (and the "optimal" condition of the array).

Regardless of the pathing method in VMware, you will only have one active port on each controller of the array if you do not create additional subnets. Without additional subnets that will cause a bottle-neck at the controller since pathing is at the disk group level rather than the LUN level for the MD300xi

Configuring RR to the same LUN across both controllers is not recommended.

This will be why it was setup with four subnets per controller. At present we only have 2 pnics per ESXi host avaiable for iSCSI so with this in mind I'm going with the following:
vSwitch1:

vmk1 IP: 192.168.132.105              (Port Binding vmnic 1)
vSwitch2:
vmk2 IP: 192.168.133.105              (Port Binding vmnic 2)

Storage array:

Controller 0/1: 192.168.130.100
Controller 0/2: 192.168.131.100
Controller 0/3: 192.168.132.100               (Static Discovery from ESXi host)
Controller 0/4: 192.168.133.100               (Static Discovery from ESXi host)

Controller 1/1: 192.168.130.101
Controller 1/1: 192.168.131.101
Controller 1/1: 192.168.132.101               (Static Discovery from ESXi host)
Controller 1/1: 192.168.133.101               (Static Discovery from ESXi host)

This will allow two additional pnics in furture if needed. The only difference I can see from the whitepaper recommeded configuration and what was already in place is Dymanic Discovery (ESXi host aware of paths it can't reach) and a single vSwitch for vmk ports.

I'll be testing the above first as its recomended configuration in Dell whitepaper but I'm still perplexed at this note:

4. Note: Port binding requires that all target ports of the storage array must reside on the same broadcast domain as the VMkernel ports because routing is not supported with port binding. See VMware KB #2017084 here.

This is right under a graphic showing the following networking configuration using different subnets:

vSwitch1:

vmk1 IP: 172.20.2.11      vmnic1
vSwitch2:
vmk2 IP: 172.21.2.11      vmnic3
vSwitch3:
vmk2 IP: 172.22.2.11      vmnic5
vSwitch4:
vmk2 IP: 172.23.2.11      vmnic7

Gkeerthy · ‎02-19-2013

Has anything drastically changed in vmkernel within last few days?

Any release notes you could refer to?

WBR

Imants

nothing changed, i know if we add to dynamic also it will go to static, i just quoted a KB which i recalled from memeory. This KB is enough for you

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=100932...

The software iscsi wont support static discovery in esx/esxi 3.5, but this limitaion is gone in vsphere 4 onwards, i fotget this fact, thanks for reminding me

Please don't forget to award point for 'Correct' or 'Helpful', if you found the comment useful. (vExpert, VCP-Cloud. VCAP5-DCD, VCP4, VCP5, MCSE, MCITP)

All

iSCSI Port Binding Question