Hi,
We currently have 2 server connected directly to a MD3000i with 2 dual port controllers. So 1 2950 is connected to 1 port on controller 0 and 1 port on controller 1 on the MD3000i. (0:0 and 1:0)
The other server is on 0:1 and 1:1)
The boot sequence is extremely slow. It times out at the part where it says: "Restoring S/W iscsi volumes" It sits here for 5 minutes or more.
It will start in the end, and it will work, but is is very odd.
Also in the VMkernel log I notice the following lines keep repeating:
Dec 10 10:03:11 tlesx2 vmkernel: 0:01:06:58.153 cpu3:1079)<5>iSCSI: session 0x9e2c1b0 iSCSI: session 0x9e2c1b0 retrying all the portals again, since the portal list got exhausted
Dec 10 10:03:11 tlesx2 vmkernel: 0:01:06:58.153 cpu3:1079)iSCSI: session 0x9e2c1b0 to iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662 waiting 60 seconds before next login attempt
Dec 10 10:03:56 tlesx2 vmkernel: 0:01:07:43.154 cpu3:1075)iSCSI: bus 0 target 0 trying to establish session 0x9e03f90 to portal 0, address 172.17.2.2 port 3260 group 1
Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.155 cpu3:1079)iSCSI: bus 0 target 2 trying to establish session 0x9e2c1b0 to portal 0, address 172.17.2.130 port 3260 group 2
Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1073)iSCSI: login phase for session 0x9e03f90 (rx 1075, tx 1074) timed out at 407816, timeout was set for 407816
Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1075)iSCSI: session 0x9e03f90 connect timed out at 407816
Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1075)<5>iSCSI: session 0x9e03f90 iSCSI: session 0x9e03f90 retrying all the portals again, since the portal list got exhausted
Dec 10 10:04:11 tlesx2 vmkernel: 0:01:07:58.159 cpu3:1075)iSCSI: session 0x9e03f90 to iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662 waiting 60 seconds before next login attempt
Dec 10 10:04:26 tlesx2 vmkernel: 0:01:08:13.160 cpu3:1073)iSCSI: login phase for session 0x9e2c1b0 (rx 1079, tx 1078) timed out at 409316, timeout was set for 409316
Dec 10 10:04:26 tlesx2 vmkernel: 0:01:08:13.160 cpu3:1079)iSCSI: session 0x9e2c1b0 connect timed out at 409316
And when I do a rescan of the iSCSI adapter I get the following:
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.085 cpu1:1039)ScsiScan: 395: Path 'vmhba32:C0:T1:L1': Vendor: 'DELL ' Model: 'MD3000i ' Rev: '0670'
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.085 cpu1:1039)ScsiScan: 396: Type: 0x0, ANSI rev: 5
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.086 cpu1:1039)WARNING: ScsiUid: 608: Path 'vmhba32:C0:T1:L1': Found more than one SNS id.
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.088 cpu1:1039)ScsiScan: 395: Path 'vmhba32:C0:T3:L1': Vendor: 'DELL ' Model: 'MD3000i ' Rev: '0670'
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.088 cpu1:1039)ScsiScan: 396: Type: 0x0, ANSI rev: 5
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.090 cpu1:1039)WARNING: ScsiUid: 608: Path 'vmhba32:C0:T3:L1': Found more than one SNS id.
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.093 cpu3:1040)SCSI: 861: GetInfo for adapter vmhba32, , max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad001f
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.093 cpu3:1040)iSCSI: session 0x9e03f90 replacement timed out, failing to queue 0x6a11380 cdb 0xa0 and any following commands to (1 0 0 0), iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.097 cpu3:1040)iSCSI: session 0x9e2c1b0 replacement timed out, failing to queue 0x6a11380 cdb 0xa0 and any following commands to (1 0 2 0), iqn.1984-05.com.dell:powervault.60022190009241cc00000000492ec662
Dec 10 10:31:08 tlesx2 vmkernel: 0:01:34:55.106 cpu1:1041)SCSI: 861: GetInfo for adapter vmhba32, , max_vports=0, vports_inuse=0, linktype=0, state=0, failreason=0, rv=-1, sts=bad001f
Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T1:L1 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc8 0xc9 0xca 0xd0
Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T1:L1: 0x1 0x3 0x0 0x10 0x60 0x2 0x21 0x90 0x0 0x92 0x41 0xcc 0x0 0x0 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x53 0x98 0x0 0x54 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x
Dec 10 10:31:08 tlesx2 vmkernel: 6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x2c 0x74 0x2c 0x30 0x78 0x30
Dec 10 10:31:08 tlesx2 vmkernel: 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x31 0x30 0x31 0x0 0x0 0x0 0x51 0x94 0x0 0x4 0x0 0x0 0x0 0x2 0x53 0xa8 0x0 0x44 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0
Dec 10 10:31:08 tlesx2 vmkernel: x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x0 0x0 0x0 0x0
Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T1:L1 0x60 0x02 0x21 0x90 0x00 0x92 0x41 0xcc 0x00 0x00 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x4d 0x44 0x33 0x30 0x30 0x30
Dec 10 10:31:08 tlesx2 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba32:C0:T3:L1 : 0x0 0x80 0x83 0x85 0xc0 0xc1 0xc2 0xc3 0xc4 0xc5 0xc7 0xc8 0xc9 0xca 0xd0
Dec 10 10:31:09 tlesx2 vmkernel: VMWARE SCSI Id: Device id info for vmhba32:C0:T3:L1: 0x1 0x3 0x0 0x10 0x60 0x2 0x21 0x90 0x0 0x92 0x41 0xcc 0x0 0x0 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x53 0x98 0x0 0x54 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x
Dec 10 10:31:09 tlesx2 vmkernel: 6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x2c 0x74 0x2c 0x30 0x78 0x30
Dec 10 10:31:09 tlesx2 vmkernel: 0x30 0x30 0x31 0x30 0x30 0x30 0x30 0x30 0x31 0x30 0x32 0x0 0x0 0x0 0x51 0x94 0x0 0x4 0x0 0x0 0x0 0x2 0x53 0xa8 0x0 0x44 0x69 0x71 0x6e 0x2e 0x31 0x39 0x38 0x34 0x2d 0x30 0x35 0x2e 0x63 0x6f 0x6d 0x2e 0x64 0x65 0x6c 0x6c 0x3a 0x70 0x6f 0x77 0x65 0x72 0x76 0
Dec 10 10:31:09 tlesx2 vmkernel: x61 0x75 0x6c 0x74 0x2e 0x36 0x30 0x30 0x32 0x32 0x31 0x39 0x30 0x30 0x30 0x39 0x32 0x34 0x31 0x63 0x63 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x30 0x34 0x39 0x32 0x65 0x63 0x36 0x36 0x32 0x0 0x0 0x0 0x0
Dec 10 10:31:09 tlesx2 vmkernel: VMWARE SCSI Id: Id for vmhba32:C0:T3:L1 0x60 0x02 0x21 0x90 0x00 0x92 0x41 0xcc 0x00 0x00 0x10 0x4a 0x49 0x3c 0xa8 0xec 0x4d 0x44 0x33 0x30 0x30 0x30
Does anyone have an idea where I should look?
Hello,
Look at the logs on your MD3000i as well. Does your SC participate in the iSCSI network?
Best regards,
Edward L. Haletky
VMware Communities User Moderator
====
Author of the book 'VMWare ESX Server in the Enterprise: Planning and Securing Virtualization Servers', Copyright 2008 Pearson Education.
Blue Gears and SearchVMware Pro Blogs: http://www.astroarch.com/wiki/index.php/Blog_Roll
Top Virtualization Security Links: http://www.astroarch.com/wiki/index.php/Top_Virtualization_Security_Links
Hi, I am having similar issues with an MD3000i also. Did you ever work this out? Thanks
This looks suspiciously like a boot-timing issue. It is trying several paths since it finds multiple entries in the SNS. But, I doubt at this point multipath is loaded and/or can be achieved, so the superfluous path(s) keep timing out. They are recognized, but cannot be initiatied since this is boot-time.
I would suggest backing down to a single path, try to boot, see what happens.
Hi,
It looks like a gateway/network configuration issue.
I would suggest you define 1 IP address at the iSCSI initiator and team the 2 nics in a single vSwitch configuration.
If the MD3000i target has 2 IP's there is no need to use 2 at the ESX server as you will still get two paths.
Hi,
No, not yet,
We are currently trying to find out the best way to monitor this device. SNMP capabilities are to speak as politely as possible: Extremely dissapointing.
And e-mail alerting is nonexistent as it is implemented bad as far as I can see composing correct e-mail as our mail server receives the message, but cannot make any sense about it. But that's not a problem for here
I'll try to look into the tips listed below.
I have exactly the same issue : an MD3000i, 2x PE2950s and ESX3.5 U3.
I have only one vmnic attached to one switch from each ESX host and 2 connections from the switch to controller ports 0_0 and 1_1 all same subnet all in same VLAN, same crap.
It's slow to rescan the vmhba as well as the boot up. The vmkernel log is constantly timing out the iscsi sessions and exhusting the two other target portals (0_1 and 1_0 - different subnet , different switch, different VLAN) that niether ESX host can even see! and thats why it's so slow.
My issue is that it should not be even looking for the other two ports. I conclude that all ports are published when you attach to any port (I have verified this with a windows box using the iscsi initiator to one port and it is aware of the others). To me this either an issue with the MD3000i (unlikely as this does'nt happen with ESX3i U3) or a bug in the ESX iscsi initiator in 3.5 U3 full...jez that would really be a surprise!
I have even tried this directly attached and I get the same crap. Incidentally I do not have this issue with the same hardware and ESX3i !!
Has anyone found a way to mask the extra ports from the iSCSI initiator? Come on VMware mate sort out your iSCSI stack.
I came of the phone with support, and they did not see anything abnormal. They checked the iSCSI configurations and it seemed fine.
I guess it is something we have to live with then. Still think it is odd.
SNMP has been taking care off. We use the command line tool to request the status from within Nagios. This does the job.
Hello there,
How's your problem resolved ?
I am having a peformance problem too and just wondering how did you sorted it out ?
see he following deployment diagram and the performance comparison.
Thorughput and linear read is slower on he 15k rpm iSCSI SAN.
Kind Regards,
AWT
I think your issue is mind set.
DAS as opposed to SAN.
On the SAN your random numbers and IOPS are by far the most important. What performance issue are you really having?
Does your exchange stink? SQL boxes? Normal flatfile server duties?
It looks to me like your SAN is providing an overall better score?
You could be right John,
I was thinking that by using direct connection to the SAN it would then performs faster without having to get some more switch in between ( + VLAN trunking to implement LAN aggregation).
the only problem here is just I was looking for the wrong number of Sequential Access throughput which is calculated in IO/s (barking at the wrong tree :-0 ).
But anyway, you've already convince me that Random Access throughput is more important for Database operation and Application server.
Thanks for your comments John.
Kind Regards,
AWT
We really did not "resolve" the problem.
VMware looked at the config and told me everyting seems to be in order. The slow startup issues is annoying when you reboot the server, but once it runs, it runs fine. No performance issues whatsoever. It is just annoying if you have to reboot the server for whatever reason, it takes forever before the storage is recognized.
Unfortunately we don't have the vmware version with vmotion. That would make life a bit easier though
So VMware says it is normal behaviour. I suppose it is a iSCSI thing. Or maybe update 4 or VMware 4 handles this better.
We have to stagger our VM's comming up. Otherwise it is painfully slow. I think it is the one place where the 1gb limit hurts you for sure.
Any one VM loading at a time seems fine, XP boots up amazingly fast, always it seems.
It is just annoying if you have to reboot the server for whatever reason, it takes forever before the storage is recognized
How much time is forever?
Have you tried to call also Dell support?
Andre
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
5 to 10 minutes. And yes I called Dell Support and they say I need to contact VMware, and VMware say they don't see any odd configurations.
At that point we are kind of done. Mind you, when it is running, it runs perfect!
We have around VM's running rather happily in our production environment, with no problems whatsoever. Just the reboot takes forever. Maybe and update from update 3 to 4 would help. Not sure.
5 to 10 minutes.
Too much.
From each ESX you can do both ping and vmking to each of your target?
(MD3000i has 2 target IP for each controller).
Have you disable chap autentication? Use instead IQN name or IP.
Andre
**if you found this or any other answer useful please consider allocating points for helpful or correct answers
So basically your host is up and it takes 5 to 10 min to see the storage afterwards? Very strange.
Hi,
I see two possible causes.
1. A previous discovery of a target that is no longer present still existing in /var/lib/iscsi/vmkbindings
2. A target portal is not definded and the discovery process is waiting for a target IP interface to respond that is not reachable by the Service Console.
vExpert 2009