Problem accessing vmfs after SAN outage

Milko · ‎04-30-2008

I had a problem in the SAN storage and an ESX datastore where many VM where hosted went offline

I was able to recover the fisical volume but unfortunatly my ESX server is not able to see it anymore as a working datastore.

I suppose that this is due to the unclean fs shutdown.

From the virtual infrastructure client I see the volume if I try to "add a storage" but if I do this all the data will be lost.

This is an urgent problem: does anyone can help me trying to recover this situation or at least access the fs to copy the files to another location?

Thank you very much for the help!

Milko Vaccaro

oreeh · ‎04-30-2008

FYI: this thread has been moved to the ESX 3.5 Configuration forum.

Oliver Reeh[/i]

[VMware Communities User Moderator|http://communities.vmware.com/docs/DOC-2444][/i]

cheeko · ‎04-30-2008

Call support, they have procedures for situations like that and you really dont wanna wipe what you have.

christianZ · ‎04-30-2008

Check the /var/log/vmkernel log file for any warnings of resignature volume

espi3030 · ‎04-30-2008

Make sure the LUN's are still prsented with the correct LUN id to your ESX server, you are correct in not re-adding the storage via the "Add storage" link. You might need to power down all VM's that are accessing the SAN storage, and re-boot the ESX host. If you aren't the SAN guy, get with them and they should be able to assist you.

Milko · ‎04-30-2008

The ESX has been rebooted and the storage adapters rescanned.

I can see the volume with path vmhba1:0:0 LUN 0 but I don't know how it was define in the ESX before the outage.

In the vmkernel logs the only messages related to this I can find are due to Storage Adapters rescans:

...

Apr 29 20:34:11 lab131169 vmkernel: 0:00:03:14.163 cpu3:1037)LinSCSI: 689: Queue depth for device vmhba1:0:0 is 32

Apr 29 20:34:11 lab131169 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba1:0:0 : 0x0 0x80 0x83

Apr 29 20:34:11 lab131169 vmkernel: VMWARE SCSI Id: Device id info for vmhba1:0:0: 0x1 0x3 0x0 0x10 0x60 0x5 0x7 0x68 0x1 0x80 0x80 0xcc 0x68 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x1 0x14 0x0 0x4 0x0 0x0 0x0 0x1 0x1 0x15 0x0 0x4 0x0 0x0 0x0 0x0 0x1 0x10 0x0 0x10 0x60 0x5 0x7 0x68 0x1 0x80 0x80 0xcc 0x

Apr 29 20:34:11 lab131169 vmkernel: 68 0x0 0x0 0x0 0x0 0x0 0x0 0x1

Apr 29 20:34:11 lab131169 vmkernel: VMWARE SCSI Id: Id for vmhba1:0:0 0x60 0x05 0x07 0x68 0x01 0x80 0x80 0xcc 0x68 0x00 0x00 0x00 0x00 0x00 0x00 0x0a 0x32 0x31 0x34 0x35 0x20 0x20

Apr 29 20:34:11 lab131169 vmkernel: 0:00:03:14.164 cpu3:1037)SCSI: 8053: vmhba1:0:0:0 Retry (unit attn)

Apr 29 20:34:11 lab131169 vmkernel: 0:00:03:14.164 cpu3:1037)SCSI: 1473: Device vmhba1:0:0 is attached to an IBM SVC.

Apr 29 20:34:11 lab131169 vmkernel: 0:00:03:14.164 cpu3:1037)SCSI: 2044: Setting default path policy to MRU on target vmhba1:0:0

...

Apr 29 20:45:43 lab131169 vmkernel: VMWARE SCSI Id: Supported VPD pages for vmhba1:0:0 : 0x0 0x80 0x83

Apr 29 20:45:43 lab131169 vmkernel: VMWARE SCSI Id: Device id info for vmhba1:0:0: 0x1 0x3 0x0 0x10 0x60 0x5 0x7 0x68 0x1 0x80 0x80 0xcc 0x68 0x0 0x0 0x0 0x0 0x0 0x0 0xa 0x1 0x14 0x0 0x4 0x0 0x0 0x0 0x1 0x1 0x15 0x0 0x4 0x0 0x0 0x0 0x0 0x1 0x10 0x0 0x10 0x60 0x5 0x7 0x68 0x1 0x80 0x80 0xcc 0x

Apr 29 20:45:43 lab131169 vmkernel: 68 0x0 0x0 0x0 0x0 0x0 0x0 0x1

Apr 29 20:45:43 lab131169 vmkernel: VMWARE SCSI Id: Id for vmhba1:0:0 0x60 0x05 0x07 0x68 0x01 0x80 0x80 0xcc 0x68 0x00 0x00 0x00 0x00 0x00 0x00 0x0a 0x32 0x31 0x34 0x35 0x20 0x20

...

Apr 29 20:45:49 lab131169 vmkernel: 0:00:16:52.198 cpu3:1037)SCSI: 8022: vmhba1:0:0:0 status = 8/0 0x0 0x0 0x0

Apr 29 20:45:49 lab131169 vmkernel: 0:00:16:52.198 cpu3:1037)SCSI: 8041: vmhba1:0:0:0 Retry (busy)

et similars.

I don't know if this may help, but in the /vmfs/devices/lvm I can see the logical volume listed (with other 2 working volumes that are also listed and mounted in /vmfs/volumes)

weestro · ‎04-30-2008

What kind of SAN storage and what problem exactly? I had something very similar to this happen. My volumes came back after a reboot of the SP that was active.

http://communities.vmware.com/message/930792

christianZ · ‎04-30-2008

Can you post the output from "esxcfg-vmhbadevs -m" here?

Milko · ‎05-01-2008

The following is the output of "esxcfg-vmhbadevs -m":

vmhba1:0:1:1 /dev/sdc1 4815e6d2-da6f2f41-8cf0-00145e6d9e5c

vmhba1:0:3:1 /dev/sdd1 48176c96-2095da71-8a96-00145e6d9e5c

It shows the other 2 storage devices that are currently correctly working.

All

Problem accessing vmfs after SAN outage