VMware Cloud Community
JCGIBSON
Contributor
Contributor

Oops! How do I swing a lun from a dead ESX 3.5 server to a live one?

(reposted from a different forum)

First, let me start by saying that I'm almost completely new to Administrating an ESX 3.5 host.

I built 2 ESX servers, both 3.5, vanilla out-of-the-box install.

I have a Clariion AX150, and I threw out 2 350GB luns to each box for storage pools.

I built most of my VMs on the first box, but due to some networking issues we were having, a couple of my folks built 3 vms on the 2nd esx box.

As fate would have it, I needed to reclaim the 2nd esx box and repurpose it for something else.

Not knowing any different, I figured I could just shut down #2 esx, remask the lun from there to #1 esx, and rescan to recover the lun, and rediscover the VMs on the primary esx server.

Well, that's clearly not the case. The lun is now showing up as /dev/sdb and "does not have a valid disk signature".

How can I recover this lun? This is the 2nd time they have build these VM's on that host, and (duh) I don't think we have a backup of those VMs.

Am I completely hosed? or is there a way to get #1 esx to snap out of it and recover this lun?

If I wipe out what I put on the 2nd box and reinstall ESX 3.5 there, will I be able to reimport this volume group and recover so I can PROPERLY move the VMs?

In a nutshell, how do I swing luns from one ESX host to another, if I DO NOT have vmotion installed?

-JG

0 Kudos
10 Replies
chrisAMS
Enthusiast
Enthusiast

i'm not sure i understand well the question but...

for me, you should remove the storage from the ESX and after add the storage to the new ESX.

could you have ssh access to the ESX ?

why don't you just shut down all VM and copy them

0 Kudos
JCGIBSON
Contributor
Contributor

I had no time to copy the VMs, or to back them up.

I had to rebuild esx2 on the spot, so I shut down all VMs, then shut down the box.

I went into navisphere and reassigned that lun to esx1.

I rebooted esx1, and now the lun shows up as /dev/sdb with a bad signature.

I need to recover this lun on esx1, so I can recover the VMs.

esx2 has been rebuilt with a completely different OS at this point, for a short-term project.

CAN I recover the lun that I reassigned to esx1? if I "add storage" it will reformat, and I can't have that. I'll lose the VMs.

If I rebuilt esx2 with 3.5 when the project is done, and reassign the lun back to esx2, will it be able to recover the lun there?

I need to know more about how ESX handles volume groups, as that is what it appears to be doing with luns when you "add storage" from the VI client,

though there is no vgscan or vgimport commands from the cli. (so it's not standard linux LVM).

Can I recover? and how do I do it? (I do NOT have Vmotion or any shared disk feature configured).

-JG

0 Kudos
jayscarff
Contributor
Contributor

Hi,

Do you have VC running? was the lun a vmfs datastore? You may have to go into the advance menu and change on of the settings there. this will tell esx that the lun is already a vmfs datastore and it should mount - something to do with the signatures on the disks. I'm not 100% sure though

J

0 Kudos
JCGIBSON
Contributor
Contributor

Yes, I have a VC running.

Do you have any details about what I need to do here? or at least a segment of the docs that I can reference?

I found a Fibre Channel guide for ESX and there was a section on re-signaturing disks, but it didn't seem to address this case.

from VC, there's precious little in the gui with regards to the storage, so I was working on the ESX box directly with the VI client.

0 Kudos
conradsia
Hot Shot
Hot Shot

Try going into the advanced settings on the configuration tab, click on LVM,and put a '1' in the LVM.EnableResignature. Re-scan the lun.

I would remove the lun first, reboot if possible and then re-add the lun.

0 Kudos
jayscarff
Contributor
Contributor

Below is what i followed when i blew away the raid config on hba (dumbass), the disks stayed the same, but needed to be seen by a new host that was built and added to the array. the stuff at the bottom you may not need. as long as its a vmfs volume you should be sweet!

good luck.

J

This procedure is used to allow you to change the Host Mode setting/director flags on your array and make all of the VMFS3 volumes visible again.

1. Stop the running VMs on all the ESX servers.

2. Change the Host Mode/Director flags on the Storage Array - now when you rescan, you will see snapshot LUN mentioned in /var/log/vmkernel.

3. Enable LVM Resignaturing on the first ESX server => set LVM.EnableResignature to 1.

-log to the ESX with VI client

-select the configuration tab

-select the Advanced setting option

-select the LVM section

-make sure that the fourth and last option allowresignaturing is set to 1.

-save the change

-select storage adapter

-select rescan adapter

-leave the default option and proceed

-you should now be able to see the VMFS

4. Disable LVM Resignaturing

-log to the ESX with VI client

-select the configuration tab

-select the Advanced setting option

-select the LVM section

-make sure that the fourth and last option allowresignaturing is set to 0.

-save the change

5. No snapshot messages should now be visible in the /var/log/vmkernel.

6. Re-label the volume

-log to the ESX with VI client

-select Datastores view in inventory view

-select the datastore, right click, select remove to remove the old label as this is associated with the old UUID of the volume

-select Hosts & Clusters view instead of Datastores view

-in the summary tab, you should see the list of datastores

-click in the name field for the volume in question and change it to the original name - you now have the correct original label associated with the resignatured volume

7. Now rescan from all ESX servers

8. Re-register all the VMs

-Because the VMs will be registered against the old UUID, you will need to re-register them in VC.

-log to the ESX with VI client

-select the configuration tab

-select Storage(SCSI, SAN & NFS)

-double-click on any of the datastores to open the Datatstore browser

-navigate to the .vmx file of any of the VMs by clicking on the folders

-right click, select 'add to inventory'

9. Remap any RDMs

-If you a VM which uses an RDM, you will have to recreate the mapping.

-the problem here is that you may not be able to identify which RDM is which if you used multiple ones.

-if they are different sizes, then this is ok - you should be able to map them in the correct order by their size

-make a note of the sizes of the RDMS and which VMs they are associated with before starting this process

-make a note of the LUN ID before starting this process too - you may be able to use this to recreate the mapping

-if they are all the same size, this is a drag since you will have to map them and boot the VM, and then check them

-if you do not use RDMs, you can ignore this step

10. Powering on the VMs

-start the VM, reply yes if prompted about a new UUID

-if any of the VMs refer to missing disks when they power up, check the .vmx file and ensure that the scsi disk references are not made against the old uuid instead of the label.

-if any of the VMs refer to missing disks when they power up, check the .vmx file and ensure that the scsi disk references are not made against the old label instead of the new label if you changed it.

11. Repeat steps 3 thru 10 for all subsequent ESX servers that are still seeing snapshot volumes.

-if all ESX servers share the same volumes, then this step will not be necessary

0 Kudos
JCGIBSON
Contributor
Contributor

I'm with you up to step 6.

I rescanned, and then reset resignaturing to 0,

then logged in to VC (you had a typo here) went to Inventory->datastores, and it's still not there.

Do I need to reboot this server after the rescan?

0 Kudos
JCGIBSON
Contributor
Contributor

Still nothing after a reboot.

I can see the device, and if I go to Configuration->Storage->Add Storage, it's there.

But if I follow that wizard, it will reformat the disk, and my Vms will be lost. No apparent documentation on how to get out of this mess Smiley Sad

So I'm back to my original question, part 2: If I simply wipe out what I did yesterday, and reinstall ESX 3.5 on the esx2 server, and swing the lun back,

will it all heal itself? I guess the better question is: Does ESX base the uuid you mentioned on HBA serial number/WWNN/WWNP? or is it a random number.

0 Kudos
jayscarff
Contributor
Contributor

Hey,

I dont think putting esx back onto the 2nd server will make a difference, you could try, only takes few minutes to install, you'll have to have the same name i guess.

Step 6 not working at all?

0 Kudos
JCGIBSON
Contributor
Contributor

Well, up to step 6, I did everything as indicated, and no errrors...

But I never saw the disk show up as a datasstore object I could manipulate.

-JG

0 Kudos