iSCSI redundancy/failover help

smoke455 · ‎09-07-2007

I have been having an intermittent problem losing my iSCSI connection on my ESX 3.1 server about once a week. So I am trying to add a redundant or failover connection to the iSCSI box.

So far I setup this;

iSCSI Appliance

192.168.2.1/255.255.255.0

192.168.3.1/255.255.255.0

ESX Server

vSwitch3

192.168.2.10 (Service Console)

192.168.2.20 (VMKernel)

vSwitch5

192.168.3.10 (Service Console)

192.168.3.20 (VMKernel)

under configuration, I click on properties for the iSCSI software adapter

under dynamic discovery I have both IP addresses listed for both of the iSCSI appliance nics

is this enough to get redundancy/failover or am I missing something?

(I'm a noob and learning as I go on this...)

christianZ · ‎09-08-2007

What iscsi storage are you using?

I suppose you can't do failover by configuring 2 target's ip addresses.

smoke455 · ‎09-08-2007

What iscsi storage are you using?

its a miSAN iSCSI unit from Cybernetics

I suppose you can't do failover by configuring 2
target's ip addresses.

well, the first thing I tried was to set the 2 nic's in the iSCSI unit to 192.168.2.1 and 192.168.2.2, but the iSCSI unit complains that the addresses can't be on the same subnet.

So I setup 2 different VLANS and put each nic in its own vlan. Setup the vlans on 2 switches and setup 2 vswitches in ESX. Since iSCSI on ESX won't list 2 paths to the same iSCSI target I wasn't sure that is was finding the target.

This morning I pulled the network cable on the 192.168.2.1 nic and it didn't miss a beat switching over to the 192.168.3.1 nic. So I guess it does work.

I'm still trying to find if ESX keeps a log of when one iSCSI link dies and it starts using the other.

christianZ · ‎09-08-2007

To really test the failover you should have a running vm on iscsi volume (e.g. with running clock) and then pull the cable.

You can check /var/log/vmkernel and /var/log/messages for warnings/errors.

Can you post here following:

esxcfg-mpath -l

esxcfg-vmknic -l

esxcfg-vswif -l

esxscg-vswitch -l

smoke455 · ‎09-08-2007

To really test the failover you should have a running
vm on iscsi volume (e.g. with running clock) and then
pull the cable.

I did keep one NetWare server running at the DRDOS level - it could still run commands and load NetWare after I pulled the cable.

You can check /var/log/vmkernel and /var/log/messages
for warnings/errors.

Thanks

Can you post here following:
esxcfg-mpath -l
esxcfg-vmknic -l
esxcfg-vswif -l
esxscg-vswitch -l

esxcfg-mpath -l

Disk vmhba0:0:0 /dev/sda (152587MB) has 1 paths and policy of Fixed

Local 2:8.0 vmhba0:0:0 On active preferred

Disk vmhba40:0:0 /dev/sdc (1430448MB) has 1 paths and policy of Fixed

iScsi sw iqn.1998-01.com.vmware:cwg157-160f7576<->iqn.2007-06.com.cybernetics:17896443bd2666ddda377ea4b96fd6cf.vdisk2 vmhba40:0:0 On active preferred

esxcfg-vmknic -l

Port Group IP Address Netmask Broadcast MAC Address MTU Enabled

VMkernel 2 192.168.3.20 255.255.255.0 192.168.3.255 00:50:56:64:9f:67 1514 true

VMkernel 192.168.2.20 255.255.255.0 192.168.2.255 00:50:56:6f:88:d8 1514 true

esxcfg-vswif -l

Name Port Group IP Address Netmask Broadcast Enabled DHCP

vswif0 Service Console 10.10.75.25 255.255.255.0 10.10.75.255 true false

vswif1 Service Console 2 192.168.2.10 255.255.255.0 192.168.2.255 true false

vswif2 Service Console 3 192.168.3.10 255.255.255.0 192.168.3.255 true false

esxscg-vswitch -l

-bash: esxscg-vswitch: command not found

christianZ · ‎09-09-2007

sorry the last command should be:

esxcfg-vswitch -l

Did you pull the cable from host or from iscsi storage?

Message was edited by:

christianZ

smoke455 · ‎09-09-2007

I pulled the cable from the iSCSI appliance. The vmkernel log showed it took 3 seconds to switch over to the other nic and resume using the iSCSI LUN on the other address.

esxcfg-vswitch -l

Switch Name Num Ports Used Ports Configured Ports Uplinks

vSwitch0 32 3 32 vmnic0