VMware Cloud Community
rlev
Contributor
Contributor

iSCSI Datastores Missing After Upgrade

We had a two host cluster running off of an OpenFiler SAN with iSCSI. Yesterday, we decided to upgrade one of the hosts from ESX 3.5 to vSphere 4 via the update manager. The upgrade appeared to go just fine; installed, rebooted, came back up, etc. Except now, none of the iSCSI datastores are there. The storage adapter on the vsphere 4 host is configured correctly, and it sees the LUN's perfectly fine...it just doesn't bring the datastore back up or even show it at all. If we go to 'Add Storage' there are no available LUNs listed. The second host (still ESX 3.5), is still connected up just fine to the datastore and has VMs running off it. We tried booting from the 4.0 CD and doing a clean install, but still the same thing. It sees the LUNs just fine, they show up under an 'fdisk -l', but the datastore doesn't come back. Anyone else seen this before or have any suggestions? Thanks!

And yes they are VMFS3 datastores, not VMFS2.

Reply
0 Kudos
14 Replies
dtracey
Expert
Expert

Hi, have you checked your vSwitch settings following the upgrade? I was just on another post on the forum with a guy using Openfiler and we looked at everything else - turned out he needed to enable Promiscuous mode for the vSwitch/Port group passing iSCSI traffic to OpenFiler - worth a quick look?

Dan

Reply
0 Kudos
rlev
Contributor
Contributor

I just checked and promiscuous mode was set to 'Reject' on that vswitch. I changed it over to accept, rescanned all HBA and VMFS, did refresh on the host storage, and still no luck. The strange thing is that the host sees the LUNs, openfiler shows that the ESX iSCSI initiator is connected to the targets, but the datastore does not come up at all.

Reply
0 Kudos
dtracey
Expert
Expert

Hi mate,

Have a look at this link:

http://ict-freak.nl/2006/11/19/double-lun-ids-with-esx3-in-combination-with-openfiler/

And see what you think ( i realise it relates primarily to ESX 3...)

Dan

Reply
0 Kudos
rlev
Contributor
Contributor

Thanks for the link. This article was from 2006 though and has already been fixed. I have checked the mentioned conf file on our openfiler box and does have them listed as different LUN numbers. Also, as I mentioned before, the datastores are working flawlessly still in the host that is still 3.5.

I did find this book about VCP and 3.5.

http://books.google.com/books?id=WmoMK8eBkTkC&pg=PA173&lpg=PA173&dq=esxchangeiscsi+wwn&source=bl&ots=evVQikWxL4&sig=a1lnxMfwhm3DtF0M_xJ-78byOzc&hl=en&ei=friKSomdFJGeMN7_4MsP&sa=X&oi=book_result&ct=result&resnum=1#v=onepage&q=&f=false

At the bottom of page 174, it mentions that the iSCSI initiator will always be assigned as vmhba32. This is verified on our 3.5 host, the iSCSI adapter is vmhba32. However on the upgrade host (4.0), it is now vmhba35. It also is showing more storage adapters now, I'm guess due to the expanded support for local storage controllers.

Reply
0 Kudos
dtracey
Expert
Expert

Anything weird in the vmkernel logs? Anything relating to SCSI reservations / conflicts?

(FWIW - just checked my vSphere hosts and the iSCSI software adapter is set to vmhba33.. - this was a fresh install rather than an upgrade though.)

Reply
0 Kudos
rlev
Contributor
Contributor

Well there were plenty of errors in the log. I have attached the relevant parts of the vmkernel log below for review. It appears that it is seeing the SANs as snapshots. This was a simple fix in 3.5 with the lvm.disallowsnapshotlun advanced command. I didn't find this in 4.0 though. I was able to find some info on the esxcfg-volume command and this is the output of a list with that command:

root@esx1 /# /usr/sbin/esxcfg-volume -l

VMFS3 UUID/label: 4a53767e-bbf00881-7efe-002219232681/Backup_SAN

Can mount: No (some extents missing)

Can resignature: No (some extents missing)

Extent name: t10.F405E46494C45400E4A50553A6D6D223233315D207472775:1 range: 0 - 1999871 (MB)

VMFS3 UUID/label: 49b8035f-fe767c87-6a5f-00221921495d/VM_Storage_SAN

Can mount: No (some extents missing)

Can resignature: No (some extents missing)

Extent name: t10.F405E46494C45400130375832645D226F48317D293674656:1 range: 0 - 1572607 (MB)

Extent name: t10.F405E46494C4540043C62725A4A4D2A4B635D4D2D4444336:1 range: 1572608 - 3145215 (MB)

Then I tried a -m to mount one of them and got the following:

root@esx1 /# /usr/sbin/esxcfg-volume -m 4a53767e-bbf00881-7efe-002219232681

Mounting volume 4a53767e-bbf00881-7efe-002219232681

Error: Unable to mount this VMFS3 volume due to some extents missing

So, where did these extents go? Are they missing because it is seeing them as snapshots? If so, how can I remedy that given the situation that I still have a 3.5 host currently running critical servers off of the datastore on that SAN?

EDIT: I'm also seeing lots of warnings about being unable to read volume header and the address being unmapped. Is this just related to it being seen as a snapshot LUN and possibly needing to be resignatured? What are the effects of resignaturing an online datastore?

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot

now the most important question at first: DID you actually use extends or is the system only thinking you use extends?

Now lets continue:

- Are there any snapshots?

- was ANY LUN under ESX 3.5 created with EACTLY 2TB? (which would be bad with ESX4...)

regards

Joerg

Reply
0 Kudos
rlev
Contributor
Contributor

Yes, we did use extents in order to overcome the 2TB LUN size limit.

None of them are snapshots. We created each individual LUN and added them to the iSCSI target, then added them all back in to a single datastore as extents.

None of them are exactly 2TB. The closest any of them come to that, is one that is 1.9TB.

Reply
0 Kudos
joergriether
Hot Shot
Hot Shot

ok there would be several ways to examine this now but i suggest another way for security reasons because i believe there has something gone bad with your volumes, but first

i hardly advise you to open a support case with vmware!!!!! Disk problems can get very nasty and need to be examined by vmware.

Thus, I only write the following lines for TESTING and EXPERIMENTAL purposes - use them at your own risk:

as you have full access from the 3.5 perspective i suggest you do this:

use the esx4 machine to create a new datastore, map this datastore to your esx 3.5 machine, too. Now, shut down the corresponding vm machines using your

ESX 3.5 machine and migrate them to the new datastore. Now boot them with your esx4 machine.

Idea?

best,

Joerg

Reply
0 Kudos
smccreadie
Enthusiast
Enthusiast

Hello,

Ive had a lot of similiar issues with my upgrade to ESX4 with IET (which is what OF uses). What i found was that ESX was connected to the targets but seeing the volumes as snapshots and wouldn't mount them, just like your issue. When i looked at the sessions that were connected to the target (cat /proc/net/iet/session) i discovered that there were more than one session (sid) for some of the targets. It seems that when i rebooted any of the ESX hosts, the old session didnt go away and when it came back up the initiator created a new session. So when ESX looks at the volumes on these LUNs it thinks they are copies (SAN snapshots) and wont mount them. My solution was to remove the orphaned sid's in IET, and rescan in ESX, but im not sure this is possible in OF. Seems like the best thing to do would be to disable the iscsi initiator on the host, then open the properties of the initiator, click the static targets tab, and remove each of the old static targets. Then reboot ESX and re-enable and re-add in the iscsi-target. Rebooting OF would be nice too, but you may have running VM's that you cant shut down. I hope this helps, I battled with my iscsi config for weeks after the upgrade to ESX4, after it worked flawlessly for a year or more in ESX3.5. Thanks

Sean

Reply
0 Kudos
rlev
Contributor
Contributor

Thanks for all the help guys. What we ended up doing was migrating all the VMs off of the SAN (using the 3.5 host) to the local storage on the 4.0 host. Then we removed the datastore from the 3.5 host, deleted and recreated the iscsi target on the SAN, created a new datastore on the 4.0 host using the new targets, migrated all the VMs from local to SAN, then finally upgrade the 3.5 host to 4.0 and it picked up the new datastore just fine. Not the best of solutions but it got us back up and running without having to waste TOO much time reading/researching fixes. Thanks again everyone!

Reply
0 Kudos
TimPhillips
Enthusiast
Enthusiast

That`s why I don`t like free iSCSI solutions: you can get into big troubles with it. Moreover, if this solutions are not so free as they appear. Anyway, I prefer other solution, than openfiler

Reply
0 Kudos
bpjadam
Contributor
Contributor

We just suffered the same thing after upgrading some ESX 3.5 hosts to ESX 4. The LUNs from our SAN were visible to the ESX4 hosts, but not usable: no volumes found, can't read partition tables, etc. etc.

However, all the LUNs were still working fine on the ESX 3.5 host.

After logging in directly to an ESX4 host using the "unsupported" cmd line mode (essential tool, IMHO), we were able to troubleshoot the issue down to the LUN devices being literally unreadable by fdisk and dd. i.e. the host could see them but not actually read them.

What was interesting is that another LUN from the same iSCSI server was working just fine in ESX4. The difference? Turns out, the naming of the LUNs was the problem. All the "bad" LUNs were named using a nas2:2:1 style pattern whereas the working LUN was named IETF_____ (the default).

Removing the colons and changing them to _ solved the problem - finally.

Reply
0 Kudos
DSTAVERT
Immortal
Immortal

Your issue is different than the original post. The original post had a problem with a size limit on LUNs. Your post has some real value and I would suggest that you make a new post and move all the information to the new post and mark it answered. Remove the information from this one so as to not confuse someone with a problem on LUN size.

-- David -- VMware Communities Moderator
Reply
0 Kudos