Solved: Re: Dell MD3000i iSCSI with 3.5

jmacdaddy · ‎12-13-2007

I have gotten one of our test servers to attach to our new MD3000i, but only after upgrading to 3.5 from 3.0.2 Update 1. We have not attempted attaching a second server to it and sharing a LUN, that comes later.

Here is our configuration

1 Dell PE2650 with VI 3.5

2 Broadcom onboard NICs dedicated to storage

2 Cisco 3750 gigabit switches for our fabric (no uplink between the switches).

Dell MD3000i with 2 storage processors (each SP with 2 gigabit NICs).

One problem I am having is with multipathing. The two NICs are in a team, each NIC is plugged into a different switch. Both are set to active. With the four NICs on the SAN, I should be able to see four paths to my LUN.

When I set the load balancing policy to either VPort ID or MAC hash, I am able to scan targets but I will only see 2 paths on one of the two NICs. If I then unplug the NIC with the visible paths and then rescan targets, I then see four paths (2-dead, 1 - active, 1 -on). If I reboot the server I go back to just two paths.

If I set the load balancing policy to IP hash and rescan targets, I can't see anything.

Am I doing something wrong in my teaming configuration that I can't see the 4 paths or is this something on that needs to be configured on the SAN?

Thanks

P.S. I know the MD3000i is not officially supported yet.

Shane_G · ‎12-17-2007

For the second vSwitch you will need both a Service Console and a VMkernel port in the new subnet.

I configured the MD3000i so that each SP had a port in each Subnet. I also configured all four MD3000i IP addresses as SendTargets in ESX.

Off to do this on ESX 3.5 today - will be interesting to see the differences.

View solution in original post

jmacdaddy · ‎12-14-2007

Any MD3000i or iSCSI/Nic teaming experts out there willing to chime in?

tsightler · ‎12-14-2007

Well, my guess is, like seemingly everyone else who tries to setup this configuration, you've probably built an invalid network configuration. Having two nics in a team that go to two different switches that are not connected to each other is not a valid configuration. I don't know what your configuration looks like and I don't know anything about the MD3000i, but the typical configuration is something like this:

vSwtich1~~pNIC1~~pSwtich1~~Controller1/Port1~~10.10.10.1

vSwitch1~~pNIC1~~pSwitch1~~Controller2/Port1~~10.10.10.2

vSwitch1~~pNIC2~~pSwitch2~~Controller1/Port2~~10.10.10.3

vSwitch1~~pNIC2~~pSwitch2~~Controller2/Port2~~10.10.10.4

If the two physical switches are not trunked together this is a completely invalid configuration and you will probably only be able to see half of your ports at any given time. Why? Well, you've created a network where a given IP address is only reachable via one physical path from the server but the vSwitch has no way of knowing which pNIC any give IP might be reachable on. Let's look at what happens if you choose various load balancing algorithms:

IP Hash -- This load balancing policy simply picks a port using a simple algorithm which simply hashes the IP address and uses that hash to send the packet out. If the hash for 10.10.10.2 sends the packet via pNIC2 then it will never arrive at the storage array since the only way to reach 10.10.10.2 is via pNIC1.

VPort ID -- This load balancing policy simply pretends that the virtual switch has "ports" just like a physical switch, it then hashes the "port numbers" and sends all traffic for the given "virtual port" out a single pNIC. Since the software iSCSI initiator uses only a single connection via the VMkernel, thus only a single "virtual port" then every packet from the iSCSI initiator will go out of the same pNIC so basically the iSCSI initiator will try to reach all four IP addresses out of the same pNIC and thus will only be able to reach half of them.

Of course, if you pull the cable failover will kick in and the packets are forced to go out of the other pNIC and it will magically "find" the other paths.

The main reason for this issue is that the software iSCSI initiator in ESX only supports a single session. With hardware iSCSI this is not an issue since each HBA acts as a seperate initiator. Each hardware HBA would ack as a separate initiator and would only see two paths, but the HBA multipathing software would combine the 2 paths from the 2 HBA's into a single LUN with 4 paths.

In my opinion the only workable solution is to create an actual network that meets the normal rules of networking (the virtual world does not somehow suspend the rules of networking 101). The easiest way to do this is to simply interconnect the two switches, either with a trunk, or, since you have Cisco 3750's, the StackWise cable. This provides a significant level of redundancy while providing a single network that will work well with the software iSCSI initiators failover methods. It's not perfect since the two fabrics are not completely separate. SAN admins from a fiber channel background will really balk at not having completely separate fabrics, which helps in the case of an administrative failure in zoning or a complete fabric failure, but it's still probably the best option with the current limitations of the software iSCSI in ESX as this failure mode is fairly uncommon in IP networks. Even with the switches stacked you can still loose a swtich or replace a switch without loosing connectivity or incurring downtime but it is still possible that an administrative error or traffic storm could potentially take down the network, even if that likelyhood is remote.

Now, some people have reported that they can leverage the IP Hash load balancing policy to still make the separate switches work. Since the IP hashing algorithm is deterministic (the same IP will always be sent out of the same pNIC) you could theoretcially determine with pNIC would be used for each IP address and configure the storage array ports to use these IP. I'm not conviced this is a good idea as it seems fragile especially if you start thinking about failure modes like storage array controller failures rather than just network failures but since I think the MD3000i appears like an "active/active" array I guess it might work.

Others have reported that you can just use the pNICS with a failover policy and that the iSCSI software initiator will rescan when the link fails. I'm not convinced that this would be truly robust either, but I suppose it's not out of the question as long as you put all of the IP addresses for all of the ports in as discovery addesses and made sure the iSCSI initiator is conigured for continous discovery (which I think is the default). Still, that doesn't change the fact that the configuration is technically "broken".

Later,

Tom

jmacdaddy · ‎12-15-2007

I think I am getting it. So with a NIC team, the vSwitch sees the two physical uplinks as a single logical port when it comes to forwarding. If the two uplinks were not in a team, but were instead two distinct logical ports, this would not be a problem, correct? The vSwitch would maintain its MAC forwarding table and would have no problem forwarding packets to the correct physical switch. Let me guess, all uplink ports on a virtual switch are in a team whether you like it or not. Is that correct?

Since you cannot load balance iSCSI traffic, it would make sense to not team your uplinks.

tsightler · ‎12-15-2007

Shane_G · ‎12-15-2007

I agree that the network config is not valid.

I recently set up an ESX 3.0.2u1 environment with 2 x 2900's and an MD3000i. I used separate VLAN/Subnets for each switch and separate vSwitches for each iSCSI VLAN - worked well.

Each ESX Server looked like this:

vSwitch1 -> NIC1 -> pSwitch1 -> SPA_NIC1 (Subnet 1) and SPB_NIC1 (Subnet 1)

vSwitch2 -> NIC2 -> pSwitch2 -> SPA_NIC2 (Subnet 2) and SPB_NIC2 (Subnet 2)

Doing an MD3000i with ESX 3.5 next week, will be interesting to see the difference.

Also, with regards to multipathing and teamed NICs, the VMware iSCSI config guide states:

"NIC teaming paths do not show up as multiple paths to storage in ESX Server

configurations. NIC teaming is handled entirely by the network layer and must be

configured and monitored separately from the ESX Server SCSI storage multipath

configuration."

jmacdaddy · ‎12-16-2007

I thought that each ESX server could only have a single iSCSI software initiator bound? How did you setup each of your vSwitches to have it's own initiator?

Shane_G · ‎12-17-2007

Still only a single iSCSI initiator. It is bound to the VMkernel. Each vSwitch has a VMkernel port on a different subnet, so still the one initiator, just uses different vSwitches to access different connected subnets.

jmacdaddy · ‎12-17-2007

To recap, I want to be able to attach to a single lun across two pNICs, each attached to a different switch which are not stacked or trunked. I will setup a second vSwitch, configure it with a VMKernel port that is on a different subnet (VLAN). I will reconfigure 2 of the 4 ports on my MD3000i to be on this new subnet. Will I now need to add a second entry under "Send Targets" for a SAN NIC that is on this new subnet or will the SAN and iSCSI initiator be able to figure this out without any additional configuration?

Shane_G · ‎12-17-2007

For the second vSwitch you will need both a Service Console and a VMkernel port in the new subnet.

I configured the MD3000i so that each SP had a port in each Subnet. I also configured all four MD3000i IP addresses as SendTargets in ESX.

Off to do this on ESX 3.5 today - will be interesting to see the differences.

jmacdaddy · ‎12-18-2007

Finally got the chance to try your recommended setup and it works like a charm. Thank you very much for your assistance.

Shane_G · ‎12-18-2007

Great to hear it is all working for you. I did this same config on an MD3000i and ESX 3.5 yesterday and it was all good. Multipathing, etc. all working fine.

jmacdaddy · ‎12-18-2007

Shane, how has LUN sharing worked? I have heard that sharing LUNs on a MD3000i between two ESX 3.5 hosts doesn't work right. Something about the locks and that a January firmware release from Dell will address it. I did not have enough time today to get another 3.5 host up and running for a test, so your feedback would be greatly appreciated.

M_Reynolds · ‎12-19-2007

What sort of performance are you getting Shane? I've got 2 x 2950's in a cluster and an md3000i and copying between luns on the array averages around 20MB/s. I'm copying between luns that are both 3 x 300GB 15k SAS drives in a RAID5 configuration. Seems way too slow to me. I'm using 3.5.

Shane_G · ‎12-21-2007

LUN sharing is fine. All working brilliantly. MD3000i configured with three LUNs all in one Host Group.

Shane_G · ‎12-21-2007

I haven't done any LUN to LUN copies, but server performance is fine. Boot times are fast and servers running fine - including Exchange and SQL.

christianZ · ‎12-22-2007

Are you copying or cloning - the 20 MB/s are slow indeed. I would expect at least 30 MB/s. I see performance improvement on 3.5 - e.g. saw by cloning 60 MB/s (3.0.2) and now 80 MB/s (3.5) - all these on fc system. Maybe someone could take the tests from here: http://communities.vmware.com/thread/73745

This could give us the possibility of comparison with the other systems. Words like "is fast" or "vms run fast on it" are not enough for really comparison.

M_Reynolds · ‎12-22-2007

The 20MB/s was doing a copy using Veeam FastSCP so probably not the best tool. I will try using IOmeter and post my results, but won't be until the new year.

Chaton · ‎02-04-2008

Hello,

I meet the problem since friday: impossible to ping both controlers. I'll try the network configuration this evening at 18h30.

I hope it'll work ! Thanks for the tips !

AlbertWT · ‎05-17-2009

To All,

I just would like to share my experience from many sources on the Net, I’ve learned

that a direct attach of 2x 1 GB Ethernet cable from the SAN into ESXi

will not boost performance into 2 GB. I haven’ tried using VLAN

Trunking using managed switch to implement LACP due to budget

limitation.

from the attached screenhot you can only see that MPIO is the default mode which limit my peformance into just one single cable connection

hope this helps,

Kind Regards,

AWT

/* Please feel free to provide any comments or input you may have. */