VMware Cloud Community
adambg
Contributor
Contributor
Jump to solution

ESXi 4 LUN lost... Help!

Suddenly both my ESXi lost connection to one of the SAN storage.

All the VM's on that SAN become inaccessible. The VM's seems to be powered on, but the harddisks are lost.

It appears that only one LUN on that storage is missing.

From the vSphere -> Config -> Storage I no longer see that LUN (S02DS05), but other LUN's on that SAN are still there and functioning.

The LUN seems registered on the ESXi, when I do ls -l /vmfs/volumes I get it in the list, with all the other LUN's:

drwxr-xr-x 1 root root 8 Jan 1 1970 1bd3ca79-a92d52ae-679b-b665783fa67d

drwxr-xr-t 1 root root 1120 Feb 6 19:16 4a927f4d-662f02c7-1c45-001018564cdc

drwxr-xr-t 1 root root 2380 Feb 7 12:35 4a927f93-024f133a-d624-001018564cdc

drwxr-xr-t 1 root root 2520 Dec 29 20:02 4a927fd0-6c6cc79a-490c-001018564cdc

drwxr-xr-t 1 root root 1540 Feb 9 06:03 4a93ddea-157c2948-55c1-001018564cdc

drwxr-xr-x 1 root root 8 Jan 1 1970 4aa505c5-adb24296-8bc5-0024e87635da

drwxr-xr-t 1 root root 980 Sep 7 13:08 4aa505d6-10868926-7732-0024e87635da

drwxr-xr-x 1 root root 8 Jan 1 1970 50ce1573-fd9ff22f-099b-0bbeede6c61e

l----


0 root root 1984 Jan 1 1970 External01 (1) -> d4c0c593-bb854445

l----


0 root root 1984 Jan 1 1970 Hypervisor1 -> 1bd3ca79-a92d52ae-679b-b665783fa67d

l----


0 root root 1984 Jan 1 1970 Hypervisor2 -> 50ce1573-fd9ff22f-099b-0bbeede6c61e

l----


0 root root 1984 Jan 1 1970 Hypervisor3 -> efd8efe3-03bc1cbf-15e0-080efd9e7379

l----


0 root root 1984 Jan 1 1970 Local -> 4aa505d6-10868926-7732-0024e87635da

l----


0 root root 1984 Jan 1 1970 S01DS02 -> 4a927f4d-662f02c7-1c45-001018564cdc

l----


0 root root 1984 Jan 1 1970 S01DS03 -> 4a927f93-024f133a-d624-001018564cdc

l----


0 root root 1984 Jan 1 1970 S01DS04 -> 4a927fd0-6c6cc79a-490c-001018564cdc

l----


0 root root 1984 Jan 1 1970 S01NFS01-1 -> db3a1c08-9120861b

l----


0 root root 1984 Jan 1 1970 S02DS05 -> 4a93ddc2-7e3dcff0-dea7-001018564cdc

l----


0 root root 1984 Jan 1 1970 S02DS06 -> 4a93ddea-157c2948-55c1-001018564cdc

l----


0 root root 1984 Jan 1 1970 S02NFS01 -> fdce5791-7564eac1

drwxrwxrwx 1 root users 4096 Feb 7 08:45 db3a1c08-9120861b

drwxr-xr-x 1 root root 8 Jan 1 1970 efd8efe3-03bc1cbf-15e0-080efd9e7379

drwxrwxrwx 1 root users 4096 Nov 6 20:41 fdce5791-7564eac1

But I cannot see the contents:

ls -l S02DS05/

ls: S02DS05/: Bad address

The /var/log/messages shows waaaay to many things and I cannot even start pinpointing the problem.

I tried to rescan, and it found the LUN but it wants to format it.

How I can browse that LUN and extract the data from it?

How can I re-register the LUN without formatting it?

Help....

0 Kudos
1 Solution

Accepted Solutions
kastlr
Expert
Expert
Jump to solution

Hi,

the LVM Options are moved to the Add Storage Wizard in vSphere, as described above.

When your ESX Host is unable to read from the LUN, it couldn't mount your datastore.

Either open a SR with EMC to let them verify if the storage array doesn't have a problem representing the LUN properly.

Additional option would be to use vmkfstools to perform a lunreset or a breaklock operation.


Hope this helps a bit.

Greetings from Germany. (CET)


Hope this helps a bit.
Greetings from Germany. (CEST)

View solution in original post

0 Kudos
9 Replies
adambg
Contributor
Contributor
Jump to solution

running esxcfg-mpath -l shows that LUN:

iqn.1998-01.com.vmware:frodo-68a28134-00023d000002,iqn.1992-04.com.emc:ix4-200r.storage02.LUN04,t,1-naa.5000144f56002328

Runtime Name: vmhba34:C0:T4:L0

Device: naa.5000144f56002328

Device Display Name: EMC iSCSI Disk (naa.5000144f56002328)

Adapter: vmhba34 Channel: 0 Target: 4 LUN: 0

Adapter Identifier: iqn.1998-01.com.vmware:frodo-68a28134

Target Identifier: 00023d000002,iqn.1992-04.com.emc:ix4-200r.storage02.LUN04,t,1

Plugin: NMP

State: active

Transport: iscsi

Adapter Transport Details: iqn.1998-01.com.vmware:frodo-68a28134

Target Transport Details: IQN=iqn.1992-04.com.emc:ix4-200r.storage02.LUN04 Alias= Session=00023d000002 PortalTag=1

How do I add it back to the vCenter, without formatting??

This is production environment, and I don't want to loose any more data..

0 Kudos
Kasraeian
Expert
Expert
Jump to solution

Hi,

Check 17) When I try to add to add an existing VMFS datastore to my host, ESXi wants to format the storage from this link

I hope It help you a little.



-= If you found this note/reply useful, please consider awarding points for "Correct" or "Helpful" =-

-= If there's any mistake in my notes, please correct me! =-

-= Thanks =-

MCTS, VCP

If you found this note/reply useful, please consider awarding points for "Correct" or "Helpful" If there's any mistake in my notes, please correct me! Sohrab Kasraeianfard | http://www.kasraeian.com | @Kasraeian
kastlr
Expert
Expert
Jump to solution

Hello,

when the LUN is visible to the host but not the datastore, it might be that your datastore is recognized as a snapshot.

By default, a snapshotted datastore isn't mounted.

If your datastore is recognized as a snapshot, the Add Storage wizard will offer you two additional options.

- Keep the existing signature

- Assign a new signature

If you don't get these options, you might be able to get your data back when following Recovering a lost partition table on a VMFS volume


Hope this helps a bit.

Greetings from Germany. (CET)


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
Kasraeian
Expert
Expert
Jump to solution

Hi,

Thanks for the Link (I have added it to my Favorites Smiley Wink)



-= If you found this note/reply useful, please consider awarding points for "Correct" or "Helpful" =-

-= If there's any mistake in my notes, please correct me! =-

-= Thanks =-

MCTS, VCP

If you found this note/reply useful, please consider awarding points for "Correct" or "Helpful" If there's any mistake in my notes, please correct me! Sohrab Kasraeianfard | http://www.kasraeian.com | @Kasraeian
0 Kudos
adambg
Contributor
Contributor
Jump to solution

The storage is listen under storage adapters, but rescanning does not do anything..

There are no errors with "snapshot" in the /var/log/messages

I checked LVM.EnableResignature and LVM.DisallowSnapshotLun, but I don't have LVM option in the Advanced Config menu.. so this is not an option either.

So I started the recovery process:

The command esxcfg-scsidevs -c resulted in all my LUN's, incluing the faulty one:

Device UID Device Type Console Device Size Plugin Display Name

mpx.vmhba0:C0:T1:L0 CD-ROM /vmfs/devices/genscsi/mpx.vmhba0:C0:T1:L0 0MB NMP Local TEAC CD-ROM (mpx.vmhba0:C0:T1:L0)

naa.5000144f07218796 Direct-Access /vmfs/devices/disks/naa.5000144f07218796 614400MB NMP EMC iSCSI Disk (naa.5000144f07218796)

naa.5000144f08816316 Direct-Access /vmfs/devices/disks/naa.5000144f08816316 614400MB NMP EMC iSCSI Disk (naa.5000144f08816316)

naa.5000144f13276635 Direct-Access /vmfs/devices/disks/naa.5000144f13276635 453632MB NMP EMC iSCSI Disk (naa.5000144f13276635)

naa.5000144f56002328 Direct-Access /vmfs/devices/disks/naa.5000144f56002328 614400MB NMP EMC iSCSI Disk (naa.5000144f56002328)

naa.5000144f66794210 Direct-Access /vmfs/devices/disks/naa.5000144f66794210 614400MB NMP EMC iSCSI Disk (naa.5000144f66794210)

naa.600508e000000000df257be61725c406 Direct-Access /vmfs/devices/disks/naa.600508e000000000df257be61725c406 152064MB NMP Local Dell Disk (naa.600508e000000000df257be61725c406)

But I get no output when running the command fdisk -l /vmfs/devices/disks/naa.5000144f56002328

When I run it on a different device the output is correct...

Running fdisk without the -l shows this:

fdisk /vmfs/devices/disks/naa.5000144f56002328

fdisk: cannot read from /vmfs/devices/disks/naa.5000144f56002328

Any additional approach will be appreciated..

0 Kudos
kastlr
Expert
Expert
Jump to solution

Hi,

the LVM Options are moved to the Add Storage Wizard in vSphere, as described above.

When your ESX Host is unable to read from the LUN, it couldn't mount your datastore.

Either open a SR with EMC to let them verify if the storage array doesn't have a problem representing the LUN properly.

Additional option would be to use vmkfstools to perform a lunreset or a breaklock operation.


Hope this helps a bit.

Greetings from Germany. (CET)


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
adambg
Contributor
Contributor
Jump to solution

Another weird problem. Under Storage Adapters, on one of the ESXi's I see the LUN as 0GB capacity, on the other ESXi the capacity is normal (600GB).

Refresh, rescan, anything and no change..

0 Kudos
kastlr
Expert
Expert
Jump to solution

Hi,

this could happen if a SCSI reservation is in place, it would prevent a node from accessing that LUN.

Depending on the storage array (assuming Clariion) you're using,

- removing the LUN from the storage group,

- rescan storage on ESX Servers

- readding LUN to storage group

- rescan storage on ESX Servers

might solve the error condition.


Hope this helps a bit.

Greetings from Germany. (CET)


Hope this helps a bit.
Greetings from Germany. (CEST)
0 Kudos
adambg
Contributor
Contributor
Jump to solution

It worked!!!! Running vmkfstools -L lunreset solved the issue, after rescan the datastore is alive and kicking!

Thanks for the help!

0 Kudos