VMware Cloud Community
TangoMan
Contributor
Contributor
Jump to solution

ESX server 3i 3.5.0 raid failure

Hi All,

I am guessing this has been asked before but I cannot see anything that quite matches the point I am now at.

I was running ESX quite happily until this week, it seems I managed to suffer a 3 disks on a 3 disk raid 5 array all going at the same time. The system would boot and run for a while but would eventually fall over. I copied all the data from the Datastore via the Infrastructure client to a window workstation, I am just not sure how to go about getting those virtual machines back on and running to a new installation of ESX, as it seems that during the copy some files were renamed? It does seem as simple as copying the stuff back to the new data store.

Regards

0 Kudos
1 Solution

Accepted Solutions
a_p_
Leadership
Leadership
Jump to solution

You can do this the same way as you downloaded the files. Open the datastore browser, create a new folder with the VM's name and upload the files to this folder. Once uploaded, right click the vmx file and select add to inventory.

André

View solution in original post

0 Kudos
24 Replies
a_p_
Leadership
Leadership
Jump to solution

Welcome to the Community,

as it seems that during the copy some files were renamed?

I assume you are talking about the VMDK files. In ESX virtual disks consist of two files, the descriptor file (<vmname>.vmdk) and the data file (<vmname>-flat.vmdk or <vmnware>-00000x-delta.vmdk for snapshots). The datastore browser will hide the data files and only show the descriptor files (with the size of the data file).

André

PS: Discussion moved from VI: VMware ESX® 3.0 to VI: VMware ESXi™ 3.5

TangoMan
Contributor
Contributor
Jump to solution

Ok as an example one of the folder on the windows machine has the following,

28/08/2011  01:05     3,020,105,728 gnuworld-000002-delta.vmdk
28/08/2011  01:01               265 gnuworld-000002.vmdk
28/08/2011  01:06     1,258,498,048 gnuworld-000003-delta.vmdk
28/08/2011  01:05               265 gnuworld-000003.vmdk
28/08/2011  01:08       335,751,168 gnuworld-000004-delta.vmdk
28/08/2011  01:08               265 gnuworld-000004.vmdk
28/08/2011  01:10       637,741,056 gnuworld-000005-delta.vmdk
28/08/2011  01:09               265 gnuworld-000005.vmdk
28/08/2011  02:17   107,374,182,400 gnuworld-flat.vmdk
28/08/2011  01:01     1,074,871,149 gnuworld-Snapshot1.vmsn
28/08/2011  01:08     1,074,871,293 gnuworld-Snapshot3.vmsn
28/08/2011  01:09     1,074,831,448 gnuworld-Snapshot4.vmsn
28/08/2011  01:11     1,074,831,448 gnuworld-Snapshot5.vmsn
28/08/2011  00:59             8,684 gnuworld.nvram
28/08/2011  01:12               403 gnuworld.vmdk
28/08/2011  00:59             2,000 gnuworld.vmsd
28/08/2011  00:59             2,405 gnuworld.vmx
28/08/2011  00:59               263 gnuworld.vmxf
28/08/2011  22:16                 0 list.txt
28/08/2011  00:59            52,432 vmware-12.log
28/08/2011  00:59            25,556 vmware-13.log
28/08/2011  00:59            21,156 vmware-14.log
28/08/2011  00:59            24,775 vmware-15.log
28/08/2011  00:59            25,439 vmware-16.log
28/08/2011  00:59            25,439 vmware-17.log
28/08/2011  00:59            24,392 vmware.log

I have fitted new hard disks to the servers and done a fresh load of EXSi 3.5, so I just need to know how to get the data back on so that the new install so that it can use my existing virtual machines.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

You can do this the same way as you downloaded the files. Open the datastore browser, create a new folder with the VM's name and upload the files to this folder. Once uploaded, right click the vmx file and select add to inventory.

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

I am uploading the VM`s at this moment, I will let you know how it goes. A couple of other things that I have been thinking about I know that the version I am using is quite old, but the hardware I have is only 32bit, and iirc I was able to download it for free with a licence, are all the newer versions 64bit only ? The other question I have is the server is a Supermicro with a dual intel xeon 2.8ghz board, this board has 6 dimm slots and a max of 12gb of ram, I am considering buying dimms to take it up to 12gb, but under normal circumstances a 32bit o/s cannot access more than 4gb of ram, does EXSi have some special tricks to allow it access to the full amount of ram in the server ?

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

You shouldn't have any issues with the memory as long as your hardware supports it. ESXi 3.5 supports up to 256GB of physical memory (see http://www.vmware.com/pdf/vi3_35/esx_3/r35/vi3_35_25_config_max.pdf)

Regarding newer versions. If the current ESXi 3.5 version runs on your system without issues, I'd recommend you stick with it. It is possible to run ESXi 4.x on 32 bit hardware - although not supported - but free licenses are not available anymore (since the release of vSphere 5).

A few words about your setup and performance. I'm not sure whether you are familiar with RAID levels and how they work, however a RAID 5 with only 3 disks is one of the slowest configurations you can have. With RAID 5 each write access goes to 2 disks simultaneously (data + parity). With 4 or more disks multiple writes can be done at the same time to different "pairs" of disks. You should also consider to delete snapshots if you don't need them anymore. Snapshots in ESXi are used in a chain which means all vmdks are used and for each data block, the Hypervisor has to find the vmdk with the most current one. This can slow down the virtual machines. (see http://kb.vmware.com/kb/1015180)

André

EDIT: Sorry, ESXi 4.x only runs on 64 bit hardware. It is possible to run it on 64 CPU's without VT capabilities, but then you are limited to 32 bit guests only.

TangoMan
Contributor
Contributor
Jump to solution

I think I am almost there, I imported one virtual machine as you suggested, it failed to boot saying a file was missing .. it seems the -0000001.vmdk file and the corresponding delta file are missing, changing the hard disk to point to the large .vmdk file gets the machine booting, is there anyway to use my existing snapshots to try get the machine to a more up to date state ?

As for the raid performance, I have to say I never gave it a second thought I had 3 disks and just fired them in, as the server case can only take 3 hard disks and as there is no need for massive amounts of storage on the vmware host machine I have now fitted 3 x 500gb drives and have configured this as a mirror with hotspare.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

changing the hard disk to point to the large .vmdk file gets the machine booting

That' wasn't a good idea, now the snapshot chain is broken and needs to be fixed. Depending on the data modified while the VM was running from the base disk there could be some data corruption.

Do you still have the vmware-xx.log file in the VM's folder which contains the error message when you initially started the VM?

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

Message was edited by: a.p. - replaced pasted log file by an attachment for better readability.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

It looks like the 000001.vmdk snapshot got lost. Depending on its size and which data was modified while it was active, this most likely means you lost data. Unless there's a way to eventually recover it from the old disks we could chain the other disks together without snapshot 1 and clone them to a new virtual disk to see what we get.

To do this we need to modify the vmx file as well as one of the vmdk files. You also need sufficient free disk space for the clone (the size of the original base/flat disk). If you want to try this, please attach the vmx file, and the small vmdk files. If possible archive/compress these files and attach them in a zip file.

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

Let me try and recover the missing file right now, if not I will post the files you wanted

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Actually 2 files are missing, the gnuworld-000001.vmdk header file as well as the gnuworld-000001-delta.vmdk data file.

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

Ok looks like the old raid set is completely dead, so here are the files you wanted.

Regareds

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

I attached a modified gnuworld-000002.vmdk where the parent file now points to the gnuworld.vmdk (and its CID) instead of the missing gnuworld-000001.vmdk I assume the files were the latest ones from the datastore after you powered on the VM with the base disk attached!? If not, you may need to replace the parentCID in gnuworld-000002.vmdk with the value of CID found in the gnuworld.vmdk.

What you should do with the "fixed" snapshot chain is to create a clone of the virtual disk. To be able to do this you will need another 100GB of free disk space!

To clone the disk run the following command from the VM's folder on the command line:

vmkfstools -i gnuworld-000005.vmdk gnuworld-clone.vmdk

Once the disk is cloned, you can attach this disk to the virtual machine and try to boot from it or boot the VM from a live CD and check the files on the cloned disk.

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

Either I got something wrong or the files might have been corrupted some how with the old disk array the command gets 1% and dies with "Bad file descriptor" (589833)

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

Did you verify the value for the parentCID in snapshot 2 matches the value of CID in the base disk's descriptor file. If not, please modify the entry in the snapshot 2 descriptor file.

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

Ok

# Disk DescriptorFile
version=1
CID=dd21d558
parentCID=5c3d9592
createType="vmfsSparse"
parentFileNameHint="gnuworld.vmdk"
# Extent description
RW 209715200 VMFSSPARSE "gnuworld-000002-delta.vmdk"
# The Disk Data Base
#DDB
ddb.toolsVersion = "2147483647"

# Disk DescriptorFile

version=1

CID=5c3d9592

parentCID=ffffffff

createType="vmfs"

# Extent description

RW 209715200 VMFS "gnuworld-flat.vmdk"

# The Disk Data Base

#DDB

ddb.virtualHWVersion = "4"

ddb.uuid = "60 00 C2 9d 62 b2 9e cd-2c 0b a6 ff 82 d0 b1 af"

ddb.geometry.cylinders = "13054"

ddb.geometry.heads = "255"

ddb.geometry.sectors = "63"

ddb.adapterType = "lsilogic"

ddb.toolsVersion = "8192"

that is what I have

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

This looks ok. What you can try is to create the clone from the second newest snapshot (in case the current one was corrupted due to the RAID failure)

vmkfstools -i gnuworld-000004.vmdk gnuworld-clone.vmdk

André

0 Kudos
TangoMan
Contributor
Contributor
Jump to solution

Ok that seems to be running away now, only 9% in so fingers crossed, I would like to thank you for your patience and your help with this problem, hopefully this will sort this vm out, of all the vm`s on the machine this is the only one that I cannot afford to loose.

0 Kudos
a_p_
Leadership
Leadership
Jump to solution

You are absolutely welcome.

Let's see what we can get back from this VM. Unfortunately the data stored in the latest snapshot file (~630 MB) will be lost and depending on the data which was stored in the missing snapshot, the VM might have show data corruption and/or lost data.

Do you have a backup of the data?

André

0 Kudos