Solved: Solaris iSCSI/ZFS volumes all showing up as the sa...

patrickmc · ‎01-13-2009

I'm currently trying to share out disks via iSCSI from Solaris. I googled around on this problem and came across several discussions around June about a flaw in the implementation that seems to match what I'm seeing.

I had originally created only a single volume, and it worked fine. After I created several others (using shareiscsi=on) I scanned again and saw that all the targets were found, but they all appeared to be clones of the original target, and it wouldn't let me add them as VMFS volumes.

I tried creating a new target with multiple LUNs via iscsitadm, but it only showed up as having a single LUN - again, the same clone.

I have another target on the same server that is being accessed by a Win2008 box fine. It can see all the different targets for what they are.

From what I understood, the problem was related to the GUIDs being the same, but I can see that the GUIDs are different on all the targets.

According to all the posts I read, this problem was resolved in OpenSolaris a while ago, but maybe it's still in the standard Solaris 10 core?

Help would be appreciated. Thanks a lot!

mike_laspina · ‎01-19-2009

I think you will like OpenSolaris better than Solaris and it's supported by Sun with a contract.

It works and it will always have all the newest enhancements months before Solaris does.

http://blog.laspina.ca/ vExpert 2009

View solution in original post

mike_laspina · ‎01-13-2009

Hi,

What version of Solaris 10 are you using?

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-13-2009

izfs# uname -a

SunOS izfs 5.10 Generic_137138-09 i86pc i386 i86pc

if there's anything else I can get you to help, let me know. Thanks!

edit: Oh, U6 is on the CD.

mike_laspina · ‎01-13-2009

Ok,

Your using Update 6 which includes the Network Address Authority (NAA) functions so it will work.

Do not use shareiscsi=on

Just use the the following to create a target.

iscsitadm create target -u 0 -b /dev/zvol/rdsk/rp1/iscsi/lun0 ss1-zrp1

iscsitadm create target -u 1 -b /dev/zvol/rdsk/rp1/iscsi/lun1 ss1-zrp1

iscsitadm create target -u 2 -b /dev/zvol/rdsk/rp1/iscsi/lun2 ss1-zrp1

iscsitadm create target -u 3 -b /dev/zvol/rdsk/rp1/iscsi/lun3 ss1-zrp1

What normally causes this issue is the signaturing of the volume. When vmware sees the iqn and you signature the LUN (aka add the volume) it uses initializes the iscsitgt NAA function and creates a GUID for the LUN in the iscsitgt service manifest. If the non NAA iscsitgt was used the volume header would be signatured the same across multiple LUN's which vmware treats as the same LUN over multiple targets.

Are you able to remove the previously created backing stores (zfs destroy) and reset the service manifest (clear the and import the defaults)?

Also the ESXi server will store the previous target bindings and it can be confusing if these are not cleaned up.

Have a look at my blog for more details

FYI I'm using OpenSolaris instead of Solaris 10 U6

Regards,

Mike

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-13-2009

I'd prefer not to destroy two of the zvols - one of them never has to be touched by the VMware hosts, though. Is it possible to keep the data on those volumes, or do I need to figure out a way to copy the data to new volumes?

How would I go about clearing/importing the service manifest?

Thanks!

mike_laspina · ‎01-13-2009

If you need the data just create a new set of zvols for this work. Be sure to thin provision them.

Leave the old one there you can swap out the backing store of a new LUN definition once you have it working with a simple destroy one of the new one and rename the old to it while the iscsitgt service is offlined.

service disable iscsitgt

e.g

Use

svccfg export -a iscsitgt > /export/home/iscsitgt-manifest-backup.xml

before clearing the config and then

svccfg delete iscsitgt

svccfg import /var/svc/manifest/system/iscsi_target.xml

to cleanup ESXi's bindings you need to ssh to it, if you don't have that enabled just goog it - ton's of howto's

then disable the iSCSI software adapter and

rm /var/lib/iscsi/mkbindings

then reboot it and renable the iSCSI software adapter.

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-14-2009

Thanks for the instructions.

Before I go off and do all that, though... is there a way I can tell that this will for sure work?

I ask because I did try to create a separate target with multiple LUNs and it didn't work before, and I don't get how this issue would affect a NEW target since it shouldn't be signed yet.

Also, your post implies that the problem is caused by having a previous ("bad") version of the iscsitgtd that doesn't apply the right signatures. I have only run on U6.

Is it also caused by shareiscsi=on, then? Or could there be another cause?

edit: I just destroyed all the additional volumes, disabled shareiscsi, and then created a new volume and target:

izfs# zfs create -sV 250g iscsi/sandbox

izfs# iscsitadm create target -b /dev/zvol/rdsk/iscsi/sandbox vmdatastores

izfs# iscsitadm list target -v

Target: vmdatastores

iSCSI Name: iqn.1986-03.com.sun:02:08173827-ef4b-cab2-90f6-9339dd0a9027.vmdatastores

Connections: 0

ACL list:

TPGT list:

LUN information:

LUN: 0

GUID: 0

VID: SUN

PID: SOLARIS

Type: disk

Size: 250G

Backing store: /dev/zvol/rdsk/iscsi/sandbox

Status: online

so the GUID starts out as 0, then I refresh HBA on the ESXi host and ... the target disappears?

I recreated it, it signs it (new GUID) and then shortly afterward it disappears. What?

edit 2: Yeah, it seems like as soon as anything connects to it, it disappears shortly afterward - did the same thing with a Win2k8 box connecting to it.

mike_laspina · ‎01-14-2009

Clear it again but do the following in this order.

Make sure you clear the w2k8 initiator connection. It will mess it up if it 's trying to connect.

1. Remove the target ip from the ESXi config and reboot the ESX server.

2. Clear the iscsitgt manifest as follows

svccfg delete iscsitgt

svccfg import /var/svc/manifest/system/iscsi_target.xml

svcadm refresh iscsitgt

svcadm restart iscsitgt

iscsitadm list target

It should have none listed now if you see any then you will also need to clear the file based persistant db file

rm -R /etc/iscsi/*

svcadm restart iscsitgt

iscsitadm create target -u 0 -b /dev/zvol/rdsk/iscsi/sandbox vmdatastores

3. Add the target back to the initiator and then rescan just the vmhba32 adapter

Do you get the target iqn listed in the VIC GUI?

When you say it disappears, where exactly? The ESXi side?

Do you have more than 1 IP address on the Solaris box?

Can you ssh to the ESXi host and list the vmkbindings file contents?

http://blog.laspina.ca/ vExpert 2009

woosht · ‎01-14-2009

I'm experiencing the same issue as Patrick.

The iscsi target is a sun x4500, running Solaris 10 (SunOS 5.10 Generic_138889-02). ESXi is running on a dell 1950. The thumper has a couple other multi-terabyte iscsi shares on it being accessed by some win2k8 machines. I can turn off the iscsi access for them temporarily but have to save the data on them.

The win2k8 iscsi shares were originally created with

zfs create -V 1000GB -o shareiscsi=on zpool/sharename

but that didn't work with ESXi, so I used Mike's suggestion of creating the luns with (there are luns 0-4 but this shows lun 0)

zfs create -s -b 64K -V 750G zpool/vmshare/lun0

and the target with

iscsitadm create target -u 0 -b /dev/zvol/rdsk/zpool/vmshare/lun0 vmshare

I disabled the initiator in ESXi (storage adapters -> vmhba32 properties -> configure) and rebooted it.

Re-enabled the initiator and did a scan... and nothing appears.

iscsitadm list target on the thumper shows two additional connections to all the targets available (the two win2k8 targets and the ESXi target)

ssh to ESXi machine, vmkiscsi-tool -T vmhba32 shows six targets, two connections each to the two win2k8 targets and the ESXi target

The windows iscsi initiators see all the targets just fine, etc. Is the next step to reset the iscsitgt service manifest on the x4500? Can I reset the service manifest without losing data on the two other win2k8 targets?

cheers!

mike_laspina · ‎01-15-2009

Hi,

You will need to create some initiator definitions to control access to your targets.

Like the the following

iscsitadm create initiator --iqn iqn.2000-04.com.qlogic:qla4050c.esx1 esx1-initiator1

iscsitadm create initiator --iqn iqn.2000-04.com.qlogic:qla4050c.esx1 esx1-initiator2

iscsitadm modify target --acl esx1-initiator1 ss1-target1

iscsitadm modify target --acl esx1-initiator2 ss1-target1

The will only allow the initiator named esx1-initiator1 & 2 to connect to the target alias named ss1-target1

This is important to prevent corruption and unintentional access.

When i first was testing Solaris I also used an MS based initiator and found that if it was connected or even discovered the LUN it would prevent VMware's iSCSI software initiiator from access the LUN correctly.

Oh. Almost missed your more important question on the data side.

Clearing the manifest does not destroy data. You need to be sure you have the manifest backed up. But in this case I don't think you need to do that part.

Just define the ACL's and see where that takes you.

I suspect that since your patch level is very current that it will be the answer to some of the issues in this thread.

Message was edited by: mike.laspina

http://blog.laspina.ca/ vExpert 2009

woosht · ‎01-15-2009

Okay, I created initiator definitions for the two win2k8 machines and the ESXi machine, and configured ACLs to limit each target to its respective machine.

iscsitadm list target now shows one connection from each of the 2k8 machines (correct) but two connections from the ESXi machine.

disabled iscsi software initiator via the ESXi management app, and moved * from /var/lib/iscsi before rebooting the ESXi machine.

the ESXi machine shows two LUNS, both with LUN ID 0 (there are 5 LUNS, ids 0-4)

if I go to add storage, I only see one LUN, LUN 0.

sooo.. I'm wondering why ESXi is making two connections. I do have two of the interfaces on the x4500 aggregated together but don't really see how that would be relevant (or even noticable) to the ESXi initiator

svccfg export -a iscsitgt results in a syntax error, but svccfg export iscitgt works. Is that enough of a backup?

when I do

svccfg delete iscsitgt

svccfg import /var/svc/manifest/system/iscsi_target.xml

svcadm refresh iscsitgt

svcadm restart iscsitgt

iscsitadm list target

I assume I will no longer see the win2k8 targets (nor will the win2k8 machines?) what's the appropriate procedure to restore access to these targets. Sorry if this is out-of-scope.

mike_laspina · ‎01-15-2009

Do you have more than 1 IP at the target ot the initiator?

There are three ways to restore access to the pre existing targets should you lose or delete the manifest.

The one you already know is to import the backup manifest which would overwirte the existing one.

The second is to manually edit the active manifest. You would create a target using the same backing store as was defined before. (that info is in the backup file as plain text)

Then you use the svccfg utility to replace the automatically generated target iqn guid with the correct value as follows( BTW some of this is in my blog )

svccfg -s iscsitgt listprop | grep target

target_ss1-zrp1/iscsi-name astring iqn.1986-03.com.sun:02:eb9c3683-9b2d-ccf4-8ae0-85c7432f3ef6.ss1-zrp1

using the returned propery value issue the following

svccfg -s iscsitgt setprop target_ss1-zrp1/iscsi-name=iqn.1986-03.com.sun:02:thepreviousguid.alias_name

svcadm refresh iscsitgt

svcadm restart iscsitgt

And finally you can edit the backup file to have the correct elements, not a lot of fun but it's a method.

Oh. I missed one element on the export, it needs the >

svccfg export -a iscsitgt > /iscsitgt-backup.xml

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-19-2009

Okay, I took some servers down and performed all these steps. I added the first one fine, but then when I add a second LUN it doesn't show up in the VIC GUI.

The vmkbindings file:

Format:
bus target iSCSI
id id TargetName

#

0 0 iqn.2006-01.com.openfiler:tsn.cfbe11328124 0

0 1 iqn.2006-01.com.openfiler:tsn.fa8b6b316e2e 0

0 2 iqn.1986-03.com.sun:02:41bb772a-3249-4589-879f-bd3f6818f70f.vmdatastores 2

I have the following from iscsitadm list target -v:

izfs# iscsitadm list target -v

Target: vmdatastores

iSCSI Name: iqn.1986-03.com.sun:02:41bb772a-3249-4589-879f-bd3f6818f70f.vmdatastores

Connections: 1

Initiator:

iSCSI Name: iqn.1998-01.com.vmware:mercury-04430842

Alias: mercury

ACL list:

TPGT list:

LUN information:

LUN: 0

GUID: 0100001635ab7ab900002a0049752de5

VID: SUN

PID: SOLARIS

Type: disk

Size: 250G

Backing store: /dev/zvol/rdsk/iscsi/sandbox

Status: online

LUN: 1

GUID: 0100001635ab7ab900002a0049752e6c

VID: SUN

PID: SOLARIS

Type: disk

Size: 900G

Backing store: /dev/zvol/rdsk/iscsi/nboaa

Status: online

so it's connecting, and I made sure that all the backing stores were new volume files - but it doesn't work, as it doesn't see the second LUN.

Anything else I can try?

Thanks for the help so far. Seems like everyone uses OpenSolaris instead of the release version ... and maybe I should too, if it would solve this issue.

mike_laspina · ‎01-19-2009

I think you will like OpenSolaris better than Solaris and it's supported by Sun with a contract.

It works and it will always have all the newest enhancements months before Solaris does.

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-20-2009

I went ahead and stuffed OS into a VM. After I got past the PEBKAC problem of selecting 32-bit in the VM creation, I configured up an iSCSI target with 4 LUNs and pointed ESXi at it. All four showed up perfectly.

So, now I have to figure out how to migrate my Solaris 10 box to OpenSolaris...

You were also quite right about liking OS better than Solaris - it feels a lot more like what I'm used to from a *nix standpoint.

Thanks much!

mike_laspina · ‎01-20-2009

Good to know you are making some headway!

Migration .... well it's more likely to be an install as a zfs bootable rpool and import the exising zfs pool disks.

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-21-2009

Yes... related to this, right now my Solaris box is running a 5gb rpool on slice 0 and a 995gb slice that's part of the iscsi pool on a 1tb disk mirror.

I'm doing work in a VM right now to test migration, but there's a way to do this that doesn't destroy my entire layout, I hope?

mike_laspina · ‎01-21-2009

Unfortunately 5GB is too small. 8GB is the minumum unless you strip it down. You could do a zfs export/import using an external store e.g usb drive nfs mount etc.

http://blog.laspina.ca/ vExpert 2009

patrickmc · ‎01-21-2009

My testing seems to show that 4GB is the bare minimum for OS. The real problem is installing to an existing zfs pool without blowing the whole disk away.

I might be able to repartition one disk, then import the (degraded) pool from the other and replace the previous slice with the new partition, then repartition the other disk and mirror both sides again... it's a huge headache though.

All

Solaris iSCSI/ZFS volumes all showing up as the same volume in ESXi