VMware Cloud Community
rmmcgr
Contributor
Contributor
Jump to solution

Recover iscsi lun

Hi,

I have an iSCSI LUN presented via OpenFiler to an ESXi 4 (stand alone) host. For some reason the host lost connectivity to the LUN, and obviously the VMs that were currently running.

I reboot the host which didn't help. I can see the device LUN is being presented successfully under the storage adapter section of the vSphere client. When I run through the add storage wizard, in the storage section, the disk is visible. However on the next step of the wizard it eventually hangs with an error message "Failed to get disk partition information".

I do not want to simply reset the partition table, and recreate the file system, as I want to recover the LUN and the VMs it contains.

Any ideas?

Thanks in advance,

Richard

Reply
0 Kudos
1 Solution

Accepted Solutions
Chamon
Commander
Commander
Jump to solution

Were you able to find anything in the logs of the OpenFiler? Do you have more than the three LUNs on the OpenFiler? If so could this LUN be a different LUN then what you think it is? Is there a chance that there is a windows or Linux machine that could have this LUN locked? Or do you have permissions configured so that these LUNs are only available to this one ESXi host? The ESXi host for some reason appears to think that this LUN does not contain a VMFS partition. Does the OpenFiler have any snapshots of this LUN?

If the ESXi host can see the LUN I would start looking to the storage and see if you can find any configuration issues there.

View solution in original post

Reply
0 Kudos
12 Replies
idle-jam
Immortal
Immortal
Jump to solution

If you can see the LUN, there is no need to add anymore as all your virtual machine should re appear. Under Datastore view, are you able to see the label of the datastore you've created under tha LUN? it should be visible and you can power up your VM as per normal.


iDLE-jAM | VCP 2, VCP 3 & VCP 4

If you found this or any other answer useful please consider the use of the Helpful or correct buttons to award points

Reply
0 Kudos
amalanco8
VMware Employee
VMware Employee
Jump to solution

hi rmmcgr, is the vmfs the only partition on the drive?, take a look at this doc:

http://communities.vmware.com/docs/DOC-1382 --- "Failed to update disk partition information-step by step fix"

regards.

If you find this information useful, please award points for "correct" / "helpful".

My blog virtualización en tu idioma

blog.hispavirt.com // virtualización en tu idioma VCDX#141
Reply
0 Kudos
Chamon
Commander
Commander
Jump to solution

If you go through the Add Storage wizard you will lose the VMs that are on the LUN. Has anything changed on your iSCSI adapter or the target? Are the iqn the same as they were before the loss of connectivity? Something must have changed or the host should simply reconnect to the storage.

Reply
0 Kudos
rmmcgr
Contributor
Contributor
Jump to solution

Hi amalanco8,

The only problem with the article you linked to is that it says that the steps described will basically destroy all data on the LUN.

I am trying to avoid this if I can.

Is there something like a chkdisk command that I could run?

Thanks,

Richard

Reply
0 Kudos
rmmcgr
Contributor
Contributor
Jump to solution

Hi Chamon,

I can not workout what changed either. The only thing I can think of is that the network might have become flooded and the host temporarily lost connection to the Openfiler server. Weird thing is that there were two other LUNs that were being presented and they have reappeared just fine. The difference between those and the one with the problem is that it had a running VM on it at the time.

Is there somewhere where I can check what the iqn might have been before? A log file?

Thanks,

Richard

Reply
0 Kudos
Chamon
Commander
Commander
Jump to solution

If you have other LUNs that came back then it isn't the iqn. Do you see any errors on the openfiler? I haven't worked with them before so I don't know how much info it would provide. Did its LUN # change? Can you mask the LUNs from the ESXi host and then rescan? I would try to be sure that all of the LUNs have been released from the host and then try and attach them again.

Reply
0 Kudos
rmmcgr
Contributor
Contributor
Jump to solution

Hi Chamon,

Is it safe to run vmkfstools -B --breaklock is the iSCSI LUN in question)?

Also, I have been looking at http://kb.vmware.com/kb/1002281

When I run fdisk -l on the working LUNS, I get some output as described. When I do it on the problem LUN, it does not give any output, including error messages.

Thanks,

Richard

Reply
0 Kudos
Chamon
Commander
Commander
Jump to solution

I have not used the --Breaklock in the past. Were you able to release the LUNS and ensure the the host does not see the storage at all and then reattach them? Your other LUNs don't have any data on them correct? Do you have anything pointing to a problem in the Logs on the host?

Reply
0 Kudos
rmmcgr
Contributor
Contributor
Jump to solution

I removed the link to the iSCSI storage and re-established it (via the GUI). In /var/log/messages, I saw these entries:

Nov 22 20:20:37 vmkernel: 305:19:15:25.497 cpu2:121367949)ScsiCore: 1181: Sync CR (opcode 28) at 384 (wid 0)

Nov 22 20:20:38 vmkernel: 305:19:15:26.409 cpu2:121367949)ScsiCore: 1181: Sync CR (opcode 28) at 368 (wid 0)

Nov 22 20:20:39 vmkernel: 305:19:15:27.403 cpu2:121367949)ScsiCore: 1181: Sync CR (opcode 28) at 352 (wid 0)

Nov 22 20:20:39 vmkernel: 305:19:15:27.772 cpu2:121367949)ScsiDeviceIO: 1056: CmdSN 0x0 to device t10.F405E46494C454009363A4155316D2A7468313D2E60713E6 timed out: expiry time occurs 11ms in the past

Nov 22 20:20:39 vmkernel: 305:19:15:27.772 cpu2:121367949)WARNING: ScsiDeviceIO: 1266: Failed to issue command (0x28) on device t10.F405E46494C454009363A4155316D2A7468313D2E60713E6: Timeout

Nov 22 20:20:39 vmkernel: 305:19:15:27.772 cpu2:121367949)WARNING: Partition: 705: Partition table read from device t10.F405E46494C454009363A4155316D2A7468313D2E60713E6 failed: Timeout

Nov 22 20:20:39 vmkernel: 305:19:15:27.773 cpu2:121367949)ScsiDevice: 1830: Successfully registered device "t10.F405E46494C454009363A4155316D2A7468313D2E60713E6" from plugin "NMP" of type 0

Nov 22 20:20:39 Hostd: SendStorageInfoEvent() called

Nov 22 20:20:39 Hostd: SendStorageInfoEvent() called

Nov 22 20:20:39 Hostd: ReconcileVMFSDatastores called: refresh = true, rescan = false

Nov 22 20:20:39 Hostd: RefreshVMFSVolumes called

Nov 22 20:20:39 Hostd: StoragePathUpdate: Path state change event generated Storage related Notifications

Nov 22 20:20:39 Hostd: StoragePathUpdate Refresh --: timestamp=1290457159818506 updated=0x66f2fca8

Everything is as before: The storage adapter shows the three luns, and the storage configuration section only shows two VMFS volumes but not the critical third volume.

I am Googling on the above, but nothing that stands out at the moment.

Regards,

Richard

Reply
0 Kudos
Chamon
Commander
Commander
Jump to solution

Were you able to find anything in the logs of the OpenFiler? Do you have more than the three LUNs on the OpenFiler? If so could this LUN be a different LUN then what you think it is? Is there a chance that there is a windows or Linux machine that could have this LUN locked? Or do you have permissions configured so that these LUNs are only available to this one ESXi host? The ESXi host for some reason appears to think that this LUN does not contain a VMFS partition. Does the OpenFiler have any snapshots of this LUN?

If the ESXi host can see the LUN I would start looking to the storage and see if you can find any configuration issues there.

Reply
0 Kudos
rmmcgr
Contributor
Contributor
Jump to solution

I have it configured so that only this host can access this LUN. I think that this is the correct LUN as I have been making a note of the ID numbers and all are matching up. This is definitely the one.

I will have a look at the Openfiler side of things as you suggest and see what I can find out there and post back.

Thanks,

Richard

Reply
0 Kudos
rmmcgr
Contributor
Contributor
Jump to solution

OK, started to explore how I could see the status or something in Openfiler. I realised that the one task I had not done was a reboot of Openfiler. In the web GUI console I noticed that it had a "check disk on reboot" tick box. So I ticked that and did a reboot.

On reboot it found some errors and fixed them. Then ESXi was able to see the LUN! Finally!

Thank you so much for everyone's help.

Regards,

Richard

Reply
0 Kudos