RobertNL
Contributor
Contributor

ISCSI issues slow startup with MD3000i and 2 Dell 2950

Hi,

We currently have 2 server connected directly to a MD3000i with 2 dual port controllers. So 1 2950 is connected to 1 port on controller 0 and 1 port on controller 1 on the MD3000i. (0:0 and 1:0)

The other server is on 0:1 and 1:1)

The boot sequence is extremely slow. It times out at the part where it says: "Restoring S/W iscsi volumes" It sits here for 5 minutes or more.

It will start in the end, and it will work, but is is very odd.

Also in the VMkernel log I notice the following lines keep repeating:

Dec 10 10:03:11 tlesx2 vmkernel: 0:01:06:58.153 cpu3:1079)<5>iSCSI: session 0x9e2c1b0 iSCSI: session 0x9e2c1b0 retrying all the portals again, since the portal list got exhausted

Dec 10 10:03:11 tlesx2 vmkernel: 0:01:06:58.153 cpu3:1079)iSCSI: session 0x9e2c1b0 to iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662 waiting 60 seconds before next login attempt

Dec 10 10:03:56 tlesx2 vmkernel: 0:01:07:43.154 cpu3:1075)iSCSI: bus 0 target 0 trying to establish session 0x9e03f90 to portal 0, address 172.17.2.2 port 3260 group 1

Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.155 cpu3:1079)iSCSI: bus 0 target 2 trying to establish session 0x9e2c1b0 to portal 0, address 172.17.2.130 port 3260 group 2

Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1073)iSCSI: login phase for session 0x9e03f90 (rx 1075, tx 1074) timed out at 407816, timeout was set for 407816

Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1075)iSCSI: session 0x9e03f90 connect timed out at 407816

Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1075)<5>iSCSI: session 0x9e03f90 iSCSI: session 0x9e03f90 retrying all the portals again, since the portal list got exhausted

Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1075)iSCSI: session 0x9e03f90 to iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662 waiting 60 seconds before next login attempt

Dec 10 10:04:26 tlesx2 vmkernel: 0:01:08:13.160 cpu3:1073)iSCSI: login phase for session 0x9e2c1b0 (rx 1079, tx 1078) timed out at 409316, timeout was set for 409316

Dec 10 10:04:26 tlesx2 vmkernel: 0:01:08:13.160 cpu3:1079)iSCSI: session 0x9e2c1b0 connect timed out at 409316

And when I do a rescan of the iSCSI adapter I get the following:

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.085 cpu1:1039)ScsiScan: 395: Path 'vmhba32:C0:T1:L1': Vendor: 'DELL ' Model: 'MD3000i ' Rev: '0670'

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.085 cpu1:1039)ScsiScan: 396: Type: 0x0, ANSI rev: 5

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.086 cpu1:1039)WARNING: ScsiUid: 608: Path 'vmhba32:C0:T1:L1': Found more than one SNS id.

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.088 cpu1:1039)ScsiScan: 395: Path 'vmhba32:C0:T3:L1': Vendor: 'DELL ' Model: 'MD3000i ' Rev: '0670'

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.088 cpu1:1039)ScsiScan: 396: Type: 0x0, ANSI rev: 5

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.090 cpu1:1039)WARNING: ScsiUid: 608: Path 'vmhba32:C0:T3:L1': Found more than one SNS id.

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.093 cpu3:1040)SCSI: 861: GetInfo for adapter vmhba32, , max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad001f

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.093 cpu3:1040)iSCSI: session 0x9e03f90 replacement timed out, failing to queue 0x6a11380 cdb 0xa0 and any following commands to (1 0 0 0), iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.097 cpu3:1040)iSCSI: session 0x9e2c1b0 replacement timed out, failing to queue 0x6a11380 cdb 0xa0 and any following commands to (1 0 2 0), iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662

Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.106 cpu1:1041)SCSI: 861: GetInfo for adapter vmhba32, , max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad001f

Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T1:L1 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc8 0xc9 0xca 0xd0

Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T1:L1: 0x1 0x3 0x0 0x10 0x60 0x2 0x21 0x90 0x0 0x92 0x41 0xcc 0x0 0x0 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x53 0x98 0x0 0x54 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x

Dec 10 10:31:08 tlesx2 vmkernel: 6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x2c 0x74 0x2c 0x30 0x78 0x30

Dec 10 10:31:08 tlesx2 vmkernel: 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x31 0x30 0x31 0x0 0x0 0x0 0x51 0x94 0x0 0x4 0x0 0x0 0x0 0x2 0x53 0xa8 0x0 0x44 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0

Dec 10 10:31:08 tlesx2 vmkernel: x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x0 0x0 0x0 0x0

Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T1:L1 0x60 0x02 0x21 0x90 0x00 0x92 0x41 0xcc 0x00 0x00 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x4d 0x44 0x33 0x30 0x30 0x30

Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T3:L1 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc8 0xc9 0xca 0xd0

Dec 10 10:31:09 tlesx2 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T3:L1: 0x1 0x3 0x0 0x10 0x60 0x2 0x21 0x90 0x0 0x92 0x41 0xcc 0x0 0x0 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x53 0x98 0x0 0x54 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x

Dec 10 10:31:09 tlesx2 vmkernel: 6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x2c 0x74 0x2c 0x30 0x78 0x30

Dec 10 10:31:09 tlesx2 vmkernel: 0x30 0x30 0x31 0x30 0x30 0x30 0x30 0x30 0x31 0x30 0x32 0x0 0x0 0x0 0x51 0x94 0x0 0x4 0x0 0x0 0x0 0x2 0x53 0xa8 0x0 0x44 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0

Dec 10 10:31:09 tlesx2 vmkernel: x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x0 0x0 0x0 0x0

Dec 10 10:31:09 tlesx2 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T3:L1 0x60 0x02 0x21 0x90 0x00 0x92 0x41 0xcc 0x00 0x00 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x4d 0x44 0x33 0x30 0x30 0x30

Does anyone have an idea where I should look?

0 Kudos
18 Replies
Texiwill
Leadership
Leadership

Hello,

Look at the logs on your MD3000i as well. Does your SC participate in the iSCSI network?


Best regards,

Edward L. Haletky

VMware Communities User Moderator

====

Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.

Blue Gears and SearchVMware Pro Blogs: http://www.astroarch.com/wiki/index.php/Blog_Roll

Top Virtualization Security Links: http://www.astroarch.com/wiki/index.php/Top_Virtualization_Security_Links

--
Edward L. Haletky
vExpert XIII: 2009-2021,
VMTN Community Moderator
vSphere Upgrade Saga: https://www.astroarch.com/blogs
GitHub Repo: https://github.com/Texiwill
0 Kudos
RobertNL
Contributor
Contributor

Yes,

The service console is configured on each vmkernel port.

Both vswitch1 and 2 are in a different subnet.

I added a picture.

The MD3000i event log seems ok. No errors, and I have SCSI sessions.

0 Kudos
planetelex
Contributor
Contributor

Hi, I am having similar issues with an MD3000i also. Did you ever work this out? Thanks

0 Kudos
bobross
Hot Shot
Hot Shot

This looks suspiciously like a boot-timing issue. It is trying several paths since it finds multiple entries in the SNS. But, I doubt at this point multipath is loaded and/or can be achieved, so the superfluous path(s) keep timing out. They are recognized, but cannot be initiatied since this is boot-time.

I would suggest backing down to a single path, try to boot, see what happens.

0 Kudos
mike_laspina
Champion
Champion

Hi,

It looks like a gateway/network configuration issue.

I would suggest you define 1 IP address at the iSCSI initiator and team the 2 nics in a single vSwitch configuration.

If the MD3000i target has 2 IP's there is no need to use 2 at the ESX server as you will still get two paths.

http://blog.laspina.ca/ vExpert 2009
0 Kudos
RobertNL
Contributor
Contributor

Hi,

No, not yet,

We are currently trying to find out the best way to monitor this device. SNMP capabilities are to speak as politely as possible: Extremely dissapointing.

And e-mail alerting is nonexistent as it is implemented bad as far as I can see composing correct e-mail as our mail server receives the message, but cannot make any sense about it. But that's not a problem for here Smiley Happy

I'll try to look into the tips listed below.

0 Kudos
bazzS
Contributor
Contributor

I have exactly the same issue : an MD3000i, 2x PE2950s and ESX3.5 U3.

I have only one vmnic attached to one switch from each ESX host and 2 connections from the switch to controller ports 0_0 and 1_1 all same subnet all in same VLAN, same crap.

It's slow to rescan the vmhba as well as the boot up. The vmkernel log is constantly timing out the iscsi sessions and exhusting the two other target portals (0_1 and 1_0 - different subnet , different switch, different VLAN) that niether ESX host can even see! and thats why it's so slow.

My issue is that it should not be even looking for the other two ports. I conclude that all ports are published when you attach to any port (I have verified this with a windows box using the iscsi initiator to one port and it is aware of the others). To me this either an issue with the MD3000i (unlikely as this does'nt happen with ESX3i U3) or a bug in the ESX iscsi initiator in 3.5 U3 full...jez that would really be a surprise!

I have even tried this directly attached and I get the same crap. Incidentally I do not have this issue with the same hardware and ESX3i !!

Has anyone found a way to mask the extra ports from the iSCSI initiator? Come on VMware mate sort out your iSCSI stack.

0 Kudos
RobertNL
Contributor
Contributor

I came of the phone with support, and they did not see anything abnormal. They checked the iSCSI configurations and it seemed fine.

I guess it is something we have to live with then. Still think it is odd.

SNMP has been taking care off. We use the command line tool to request the status from within Nagios. This does the job.

0 Kudos
AlbertWT
Virtuoso
Virtuoso

Hello there,

How's your problem resolved ?

I am having a peformance problem too and just wondering how did you sorted it out ?

see he following deployment diagram and the performance comparison.

Thorughput and linear read is slower on he 15k rpm iSCSI SAN.

Kind Regards,

AWT

/* Any kind of comment or input would be greatly appreciated */
0 Kudos
JohnADCO
Expert
Expert

I think your issue is mind set.

DAS as opposed to SAN.

On the SAN your random numbers and IOPS are by far the most important. What performance issue are you really having?

Does your exchange stink? SQL boxes? Normal flatfile server duties?

It looks to me like your SAN is providing an overall better score?

0 Kudos
AlbertWT
Virtuoso
Virtuoso

You could be right John,

I was thinking that by using direct connection to the SAN it would then performs faster without having to get some more switch in between ( + VLAN trunking to implement LAN aggregation).

the only problem here is just I was looking for the wrong number of Sequential Access throughput which is calculated in IO/s (barking at the wrong tree :-0 ).

But anyway, you've already convince me that Random Access throughput is more important for Database operation and Application server.

Thanks for your comments John.

Kind Regards,

AWT

/* Any kind of comment or input would be greatly appreciated */
0 Kudos
RobertNL
Contributor
Contributor

We really did not "resolve" the problem.

VMware looked at the config and told me everyting seems to be in order. The slow startup issues is annoying when you reboot the server, but once it runs, it runs fine. No performance issues whatsoever. It is just annoying if you have to reboot the server for whatever reason, it takes forever before the storage is recognized.

Unfortunately we don't have the vmware version with vmotion. That would make life a bit easier though Smiley Happy

So VMware says it is normal behaviour. I suppose it is a iSCSI thing. Or maybe update 4 or VMware 4 handles this better.

0 Kudos
JohnADCO
Expert
Expert

We have to stagger our VM's comming up. Otherwise it is painfully slow. I think it is the one place where the 1gb limit hurts you for sure.

Any one VM loading at a time seems fine, XP boots up amazingly fast, always it seems.

0 Kudos
AndreTheGiant
Immortal
Immortal

It is just annoying if you have to reboot the server for whatever reason, it takes forever before the storage is recognized

How much time is forever? Smiley Happy

Have you tried to call also Dell support?

Andre

**if you found this or any other answer useful please consider allocating points for helpful or correct answers

Andre | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
RobertNL
Contributor
Contributor

5 to 10 minutes. And yes I called Dell Support and they say I need to contact VMware, and VMware say they don't see any odd configurations.

At that point we are kind of done. Mind you, when it is running, it runs perfect!

We have around VM's running rather happily in our production environment, with no problems whatsoever. Just the reboot takes forever. Maybe and update from update 3 to 4 would help. Not sure.

0 Kudos
AndreTheGiant
Immortal
Immortal

5 to 10 minutes.

Too much.

From each ESX you can do both ping and vmking to each of your target?

(MD3000i has 2 target IP for each controller).

Have you disable chap autentication? Use instead IQN name or IP.

Andre

**if you found this or any other answer useful please consider allocating points for helpful or correct answers

Andre | http://about.me/amauro | http://vinfrastructure.it/ | @Andrea_Mauro
0 Kudos
JohnADCO
Expert
Expert

So basically your host is up and it takes 5 to 10 min to see the storage afterwards? Very strange.

0 Kudos
mike_laspina
Champion
Champion

Hi,

I see two possible causes.

1. A previous discovery of a target that is no longer present still existing in /var/lib/iscsi/vmkbindings

2. A target portal is not definded and the discovery process is waiting for a target IP interface to respond that is not reachable by the Service Console.

vExpert 2009

http://blog.laspina.ca/ vExpert 2009
0 Kudos