VMware Cloud Community
patters98
Enthusiast
Enthusiast

vSphere Guest iSCSI multipathing failover not working

The failure seems to be at the vSphere networking level though, so it doesn't look like a Windows nor SAN vendor problem.

I have built a two node ESX4i (4.0.0U1) cluster using an EqualLogic PS4000XV which is connected as per the EqualLogic best practice documents. Namely, on each node:

. The vSphere software initiator is bound to four separate vmkernel ports (one per path is recommended).

. The iSCSI vSwitch uses two physical NICs, one into each dedicated iSCSI network switch (PowerConnect 5424).

. EqualLogic say you should bind half of the iSCSI vmkernels to a single pNIC, with the other pNIC configured as unused, and the other half of the iSCSI vmkernels should reverse that arrangement..

. Jumbo frames are enabled on the iSCSI vSwitch on each ESXi node at creation time, on the vNICs, and on the physical network switches.

. The two physical iSCSI switches use flow control, and have a 6Gbps link aggregation trunk between them.

So far so normal. This all works and vSphere fails over its storage connections as expected when cables are disconnected.

The problem is with guest VMs. For some of these I need to connect iSCSI LUNs using the Microsoft iSCSI Initiator within the VM so I can use Exchange- and SQL-aware off-host backup using Backup Exec - which leverages the EqualLogic Hardware VSS provider. For this I need to create two additional Virtual Machine Port Groups ("VM iSCSI 1" and "VM iSCSI 2") attached to the same vSwitch the vSphere iSCSI initiator is using, and create a vNIC in each (see screenshot):

As with those vmkernel ports, one is bound to the 1st pNIC with the 2nd pNIC unused, and the other Virtual Machine Port Group is configured in the opposite way. In the Guest OS (Windows 2003R2) the EqualLogic Host Integration Tools (3.3.1) see the two adapters and enable multipathing. The tools show an iSCSI connection to the SAN from each adapter.

What I discovered recently when I needed to do some maintenance is that if I disconnect one of the iSCSI pNICs the vSphere iSCSI connections fail over fine, but those iSCSI sessions that originate within the VMs do not. In the failed state I could not ping the SAN from these VMs. Since production systems were down I couldn't afford time to troubleshoot so I had to reconnect the physical cable and reboot them. All three VMs with Guest-attached LUNs failed in this way. I have of course double-checked the Networking configs on both ESXi cluster nodes. I am soon going to move all VMs onto one node for some unrelated maintenance, so I shall be able to test in more detail.

So, has anyone else encountered this behaviour? Is there perhaps a better way I should configure these VMs for iSCSI multipathing?

Reply
0 Kudos
44 Replies
binoche
VMware Employee
VMware Employee

I am not sure I catch your meaning successfully, so your results are

1, vSphere failover successfully

2, Windows VM failover failed

look like Windows VM Microsoft iSCSI Initiator is not configured successfully, could you please recheck EqualLogic best practice documents for Microsoft iSCSI Initiator? thanks

after 1 pNIC removed,

does this VM still have pNIC?

does this VM still have access to EQL storage?

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
patters98
Enthusiast
Enthusiast

Oh yes, I should probably add that I have a couple of physical servers using two NICs for iSCSI and configured with the Microsoft iSCSI Initiator and EqualLogic Host Integration Tools (same versions as running on VMs) which fail over fine.

I shall carry out more testing with the VMs tomorrow.

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

To make guest iSCSI working I suggest to specify explicit failower on VM iSCSI 1 and 2 portgroups.

Go on portgroup property and for VM iSCSI 1 enable only the first NIC, and for VM iSCSI 2 enable the second NIC.

Then on guest install HIT and configure as a physical system.

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
patters98
Enthusiast
Enthusiast

Thanks, that's how I've set them up since the start. No adapters in Standby for those port groups. Very strange. I have all my VMs migrated to one cluster node so I can test with the other today...

I have Jumbo frames enabled on those vNICs and I'm using vmxnet3 adapters in case that has any bearing on anything. The vSwitch is Jumbo frame enabled, as I mentioned above.

Reply
0 Kudos
AndreTheGiant
Immortal
Immortal

Have you tried also with the vmxnet2 driver?

Are you using latest version of HIT?

Andre

Andrew | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
Reply
0 Kudos
patters98
Enthusiast
Enthusiast

@AndreTheGiant Yes, latest version of HIT (3.3.1).

Ok, I've finally had some time to test this further. I can replicate the behaviour.

My guest VM has active iSCSI sessions from 192.168.200.49 (iSCSI 1 - vmNIC1) and 192.168.200.50 (iSCSI 2 - vmNIC5) to the EqualLogic (192.168.200.13 & 14).

I set up a continuous ping to 192.168.200.12 (the EqualLogic group IP) from the VM.

Then I pull vmNIC1's cable and I can no longer ping 192.168.200.49 from the storage LAN and no packets are lost on the continuous ping. The SAN LUN is fine. I check the active sessions and, after a little while the EqualLogic DSM kills the broken session and signs in to both endpoints from 192.168.200.50.

I replug and wait for the connections to use both adapters again.

So far so good...

Then I pull vmNIC5's cable and suddenly I see that the continuous ping fails. Windows' routing does not fail over to the other adapter on that subnet. I can ping 192.168.200.49 from the storage LAN, but not 192.168.200.50 (as you would expect). The SAN LUN seems fine...but then suddenly hangs. I open the iSCSI control panel and look at the EqualLogic tab. After a few seconds the connection logging shows a single connection using 192.168.200.50 and the SAN LUN reappears ok. However, the continuous ping from earlier does not recover. Could this be the root of the problem? Windows' routing? Or would the EqualLogic MPIO stuff ignore this - as it appears to?

This does not fail over gracefully at all. VMware fails it's own iSCSI initiator connections over without issue, but this is unusable. The volume spent some time unavailable, enough (based on previous experience) to crash SQL, Exchange, and Windows file shares even though the Windows Disk timeouts are modified by the EqualLogic Host Integration Tools to the prescribed values.

I shall now install a physical server in the same way so that I can determine whether it's a VMware or EqualLogic issue...

Reply
0 Kudos
J_R__Kalf
Contributor
Contributor

Hi Patters,

I have this part from a bit of Lefthand experience. My EqualLogic experience isn't that big yet.

What does the EqualLogic best practice manual state about the use of Microsoft's MPIO stack. I know from a Lefthand point of view you need to install the Microsoft iSCSI initiator, but you never install the microsoft MPIO stack. You get a different MPIO stack installed when installing the vendor specific integration tools.Perhaps this is the cause of the issue that your windows vm is perhaps using the wrong MPIO stack?

Perhaps this is something that you still need to check... Let me know the outcome. I'm in a project right now planning the use of various EqualLogic 6500 E and X type boxes. My DeLL contact / Storage Exper hasn't told me anything about these issues.

Jelle

VMware VCP since 2006

--- VMware VCP since 2006 --- If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
patters98
Enthusiast
Enthusiast

The EqualLogic HIT kit is well made in that regard. It installs the MS iSCSI Initiator for you with the correct options. The server I'm testing with is a fresh install - no historic MS iSCSI config.

Reply
0 Kudos
patters98
Enthusiast
Enthusiast

Ok, I've built an identical physical machine (clean install of 2003 R2 x64 SP2), all updates, latest Intel E1000 driver, Dell EqualLogic HIT kit 3.3.1.

Configured 2 iSCSI Intel E1000 NICs with MTU of 9014 bytes.

Repeated my tests that I did on the VM. It fails over perfectly, I was copying a 4GB DVD ISO across to the SAN and it paused fractionally then continued.

My continuous ping to the EqualLogic group IP dropped two packets as the Windows routing for 192.168.200.0/24 (my dedicated storage LAN) failed across to the other NIC.

So, it seems to be a problem that only affects my vSphere VMs, and the problem seems to be that, irrespective of whatever the EqualLogic MPIO driver is doing, Windows seems to refuse to route traffic for 192.168.200.0/24 down the other adapter once the active route is killed. Why would this be any different on VMware to a physical machine? Both machines are recent fresh installs.

What's really confusing is that on the VM, when one of the paths is down, I cannot ping the EqualLogic group IP at all, even though I can ping this VM's active iSCSI NIC from another host on the storage LAN (proving that the vSwitch config is correct). Again this points to Windows routing. Could the vmxnet3 driver cause this?

The route table of the VM (taken in the broken state) is as follows, and clearly shows that traffic for 192.168.200.0/24 can be routed down either adapter:

IPv4 Route Table =========================================================================== Interface List 0x1 ........................... MS TCP Loopback interface 0x10003 ...00 50 56 86 72 d7 ...... vmxnet3 Ethernet Adapter #2 0x10004 ...00 50 56 86 32 8d ...... vmxnet3 Ethernet Adapter 0x10005 ...00 50 56 86 0e 4c ...... vmxnet3 Ethernet Adapter #3 =========================================================================== =========================================================================== Active Routes: Network Destination Netmask Gateway Interface Metric 0.0.0.0 0.0.0.0 192.168.111.1 192.168.111.90 10 127.0.0.0 255.0.0.0 127.0.0.1 127.0.0.1 1 192.168.111.0 255.255.255.0 192.168.111.90 192.168.111.90 10 192.168.111.90 255.255.255.255 127.0.0.1 127.0.0.1 10 192.168.111.255 255.255.255.255 192.168.111.90 192.168.111.90 10 192.168.200.0 255.255.255.0 192.168.200.49 192.168.200.49 10 192.168.200.0 255.255.255.0 192.168.200.50 192.168.200.50 10 192.168.200.49 255.255.255.255 127.0.0.1 127.0.0.1 10 192.168.200.50 255.255.255.255 127.0.0.1 127.0.0.1 10 192.168.200.255 255.255.255.255 192.168.200.49 192.168.200.49 10 192.168.200.255 255.255.255.255 192.168.200.50 192.168.200.50 10 224.0.0.0 240.0.0.0 192.168.111.90 192.168.111.90 10 224.0.0.0 240.0.0.0 192.168.200.49 192.168.200.49 10 224.0.0.0 240.0.0.0 192.168.200.50 192.168.200.50 10 255.255.255.255 255.255.255.255 192.168.111.90 192.168.111.90 1 255.255.255.255 255.255.255.255 192.168.200.49 192.168.200.49 1 255.255.255.255 255.255.255.255 192.168.200.50 192.168.200.50 1 Default Gateway: 192.168.111.1 =========================================================================== Persistent Routes: None

I guess all I can do now is switch my VM over to using Intel E1000 vNICs and test again...

Reply
0 Kudos
patters98
Enthusiast
Enthusiast

Testing again on the physical machine, the route table is amended to show only the active adapter when one iSCSI cable is pulled.

Now I can see the problem: The difference is that VMware does not change the link status of the vNIC when you pull the cable out of the vmNIC (i.e.. the ESX server's physical NIC). The vSwitch continues to run of course, but is cut off from the storage LAN. Now I assume that the MPIO driver, regardless of Windows routing on the local machine, will detect this and fail over to another path. The problem seems to be that this can take a while.

I guess on the physical machine, because the link status on one adapter changes, Windows immediately starts to route all packets for the storage LAN down the remaining viable path before the MPIO driver fails the broken path, hence the much better failover speed.

My problem remains though - this VM iSCSI failover takes too long and takes the SAN volume offline despite that there was one viable path that remained active thoughout the test. Why is it not then seamless? It reality it crashes SQL and Exchange, even Windows file shares stop responding and therefore it is, in my eyes at least, worthless. Comments?

I think I should initiate a support case with EqualLogic since this is core product functionality and I have read on here of some imminent patch to address other vSphere/EqualLogic issues (e.g. EqualLogic tell you to make multiple vmkernel ports, VMware now say absolutely not until this hotfix - that level of contradiction is quite frankly frightening).

The issue seems to be with the latency of dead path detection in the EqualLogic DSM driver, but this effect seems to be masked on physical systems by Windows routing if you test by pulling out the LAN cable.

Reply
0 Kudos
J_R__Kalf
Contributor
Contributor

Have you made sure that on the Virtual Machine Network groups the "notify Switches" option is set to yes?

According to the guids this will trigger an active arp reset in any case there is a linkfailure at the vSwitch level.

Perhaps it's not directly window's fault, because it can't see a phys nic failover. This happened on an underlying platform. A platform Windows isn't supposed to know about when a path goes down. Windows Also doesn't know anything if a physical link between physical switches goes down somewhere in your storage network. The vSwitch / VMN should catch that failover in this particular case. Setting the "Notify Switches" option will trigger an active arp reset so your storage array knows it has to communicate with your windows VM through a different physical nic on the same vSwitch.

Let me know the outcome, I'm seriously interested what the fix is for this particular case.

Jelle

VMware VCP since 2006

--- VMware VCP since 2006 --- If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
patters98
Enthusiast
Enthusiast

I haven't changed the default setting in that regard, so it seems to be enabled. Here's what it looks like:

Reply
0 Kudos
J_R__Kalf
Contributor
Contributor

Mkay,

Assuming you have a test setup for this. Test it with the second card being on standby instead of unused. It's probably not the recommended setup by dell, but in case of a single vmnic failure the backup nic will actively take over all traffic and the notify switches option makes sure it happens with the highest priority and the lowest latency. Hence windows shouldn't feel anything about this switch on the esx platform.

Because whatever happens I don't think windows will receive a disconnect status on the nic inside the vm if the phys link of the only active vmnic goes down on the vSwitch. This is by design I think, else isolated local vSwitches without physical NICs couldn't be an option.

Jelle

Verstuurd vanaf mijn iPhone

Op 23 mrt 2010 om 21:45 heeft patters98 <communities-

emailer@vmware.com> het volgende geschreven:\

Jelle Kalf,

A new message was posted in the thread "vSphere Guest iSCSI

multipathing failover not working":

http://communities.vmware.com/message/1500301#1500301

Author : patters98

Profile : http://communities.vmware.com/people/patters98

Message:

Message was edited by: J.R. Kalf

tidying up the email post in mobile safari.

--- VMware VCP since 2006 --- If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
binoche
VMware Employee
VMware Employee

vmnic5 is unused here; in your 1st post, ". EqualLogic say you should bind half of the iSCSI vmkernels to a single pNIC, with the other pNIC configured as unused, and the other half of the iSCSI vmkernels should reverse that arrangement..", I guess because you are using port binding here;

how about to create 2 virtual switches, one attached with vmnic1, the other attached with vmnic5 and reconfigure your test bed? thanks

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
binoche
VMware Employee
VMware Employee

or just as Jelle suggested, in VM iSCSI 1 and VM iSCSI 2, move vmnic5 to active adapters also, but keep vmkernel port groups unchanged?

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
J_R__Kalf
Contributor
Contributor

or just as Jelle suggested, in VM iSCSI 1 and VM iSCSI 2, move vmnic5 to active adapters also, but keep vmkernel port groups unchanged?

binoche, VMware VCP, Cisco CCNA

</div>Sorry if my posting of last night didn't make too much sense. Let me elaborate on this based on the hardware windows machine testing patters has done as well. look at this typed out picture:

Windows machine | -


| HW switch 1 | --- | HW switch 2 | --- | Storage array

If you'd pull a NIC cable between the windows machine and Switch1, windows will see a link down and start rerouting traffic over the other nic, (picture below):

Windows machine | X | HW switch 1 | --- | HW switch 2 | --- | Storage array

But if you pull the link between Switch 1 and Switch 2, windows won't get a link-down notification and it takes time before the network starts rerouting traffic. Often beyond Windows comfortable 60 second SCSI Retry timeout error.As shown in the text picture below:

Windows machine | -


| HW switch 1 | X | HW switch 2 | --- | Storage array

Now this logic applies to the Windows Virtual Machine and vSwitch as well looking at the setup I've drawn below:

Windows VM | -


| vSwitch1 | X | HW switch 1 | --- | Storage array

Get the picture with this? So by moving the the secondary nic on both Virtual Machine Networks you've created to Standby or Active (but second on the priority list) and explicity checkboxing the Nic Failover based on Link failure and checkboxing the Notify Switches option to YES, you force your VMN to actively switch between NIC's on an underlying nic failure that the Windows VM can't detect. The failback option is a nice bonus to "automagically" fail back to the vmnic configured as primary when it's link becomes available again.

Jelle

I hope this explanation makes more sense

VMware VCP since 2006

--- VMware VCP since 2006 --- If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
Reply
0 Kudos
binoche
VMware Employee
VMware Employee

NIC teaming should help to avoid single point of failure, but as VM iSCSI 1 and VM iSCSI 2 show, both network groups do have only 1 vmnic attached, so I suggest to attach the 2nd vmnic, then if 1 vmnic removed the cable, NIC team works and the left vmnic takes over, and no need to arp reset

binoche, VMware VCP, Cisco CCNA

Reply
0 Kudos
patters98
Enthusiast
Enthusiast

Thanks guys. I can see you have understood the problem clearly. What I don't understand is that VMware's iSCSI software initiator is capable of failing a path easily enough whereas the VM's MS initiator isn't under the same exact circumstances. With your last few posts you've been concentrating on layer 2, but surely the VM couldn't care less if the vNIC's link is down - what it should be doing is checking paths at regular short intervals at layer 3, no? By having a redundant setup I'm not just guarding against an adapter failure of course - one of the SAN switches may fail, or any cable on the path. I could presumably simulate the same outage by diconnecting eth0 of the EqualLogic (leaving eth1 in the other SAN switch) and taking out the LAG port between the physical switches.

I've raised it with EqualLogic now, but again the tech I spoke to was also focussing on layer 2, and switch notification.

If I have some time today I'll test Jelle's suggestion of having a standby adapter for each VM port group. I'm pretty sure I've read several times that you should use explicit bindings to a single adapter though, so there must be a reason for that.

EDIT - is that reason perhaps that the MPIO driver examines the path at layer 2 after checking layer 3 connectivity? If you start dynamically altering that then perhaps it interferes with the path calculations...

Reply
0 Kudos
gzulauf
Contributor
Contributor

patters98-

I am very interested in the results of your test. We too are wondering how to properly set up Windows VM's with the MS iSCSI Initiator on vSphere.

Reply
0 Kudos