Hello,
We are using VMWARE ESXi and have a VM that just crashed.
If you look at the attached screenshot of our VMWARE setup, you'll notice I have opened up DATASTORE1. Inside the SQL VM you will see the following files.
SQL_1.vmdk is 500GB. This is where our SQLVM DB is located. The missing file is SQL_0.vmdk which was about 3.48 GB and that contains our Boot disk. Without the boot disk we cannot boot into the VM and extract anything from SQL_1.vmdk.
Client version: 1.23.0
Client build number: 6506686
ESXi version: 6.5.0
ESXi build number: 7388607
Are we out of luck here? Is it possible to plug in another SSD (1TB) and create a new Datastore to create a new VM and move the SQL_1.vmdk after a new Boot disk is created? Would that work?
I thought I would get on the forums here and ask a guru until we can get our support connect with VMWARE direct. THANK YOU for any and all replies.
Jim
Note that our company did have a backup but it was two days old and we are looking to extract data from the SQL VM that crashed to hopefully get the last two days. OR just get it going again.
Jim
Hi Jim
I may be able to help you. Please read
Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
If you create such a dump I can check wether the vmdk is recoverable.
With ESXi 6.5 the chances are hit or miss.
Provide a downloadlink for the dump and call me via skype if you want me to look into it asap.
Welcome to the Community,
to me it looks like only some metadata file got lost, which could possible be recreated.
Please enable SSH on the host, connect to it via e.g. putty, and run the following commands:
cd /vmfs/volumes/datastore1/SQL
ls -elisa > filelist.txt
Then use the datastore browser, and download filelist.txt, SQL.vmx, and vmware.log from the VM's folder. Compress/zip these three files, and attach the .zip archive to a reply post.
André
Thank you VERY much!! I am following your instructions now and will get you the sample asap. Much appreciated.
Jim Atwood
Thanks Andre, I am getting you the data requested for you to have a look. I greatly appreciate your reply, time, and expertise. Both of you gentleman.
Regards,
Jim Atwood
Plan A: work with Andre
Plan B: if that fails follow me
Hi Andre,
Attached are the requested files. I found out that the cause for our crash was that we ran out of disk space.
The SQL VM is still maxed out so we ( if correct ) will have to create a new VM with a 1TB disk that we are adding to first increase the space. Then hopefully move or resize the SQL Vm to give it more space. The issue of the Boot disk is what we are facing now and not sure whether we can even recover this or not.
If you need additional info, please let me know. Thank you!
Regards,
Jim Atwood
I found out that the cause for our crash was that we ran out of disk space.
That kind of explains it. However, this shouldn't case files being deleted. Definitely a bug.
Anyway, some questions regarding the data you sent.
André
Hi André,
First, thank you so much for responding and taking time out of your likely very busy day. I greatly appreciate it. Below I provided the answers to your questions along with some additional data screenshots.
Did you try to attach SQL1-vmdk to another VM?
No, we didn't have disk space and are trying to get more in there.
How much free disk space do you currently have on the datastore?
Not much to recover, but it's listed on the screenshot.
How much free memory/RAM do you currently have on the ESXi host?
On Screenshot
Do you need the existing snapshots (~65GB), which have likely been created two days ago?
Not sure actually. Our last set of data stopped on 8/12. We lost 8/13 and 8/14 data. 8/15 we processed manually. Our last backup was the early morning of 8/13 which covered up to 8/12.
Thus I am not sure if the snap shot would have any new data.
Hi
Must have been to distracted first time I read your post - but the solution was in your first screenshot:
Datastorebrowser shows the"SQL_0-flat.vmdk" vmdk - this means the descriptor is missing or has a major syntax-error.
And indeed - it is not present.
Creating one from scratch is easy - but we need the descriptor vmdk SQL_0-000001.vmdk to read the parameters : CID and size in sectors.
So please attach SQL_0-000001.vmdk
You should be able to do it yourself:
Edit this text and store it as SQL_0.vmdk
------------------------------------------------------------
Three places to edit are marked in red ....
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID= edit this value - it is the value for parentCID in the descriptor of the snapshot
parentCID=ffffffff
isNativeSnapshot="no"
createType="vmfs"
# Extent description
RW size-in-sectors-edit-here VMFS "SQL_0-flat.vmdk"
# The Disk Data Base
#DDB
ddb.adapterType = "lsilogic"
ddb.geometry.cylinders = "edit this value - it is size in sectors divided by 16065 rounded down"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.virtualHWVersion = "13"
------------------------------------------------------------------
Hi there,
Thanks for the reply.
The file that you are requesting "SQL_0-000001.vmdk" is the one unfortunately that is missing from the Datastore1. It somehow disappeared after the crash.
I will try to work through your instructions to create one from scratch.
Jim
I ran the following command and noticed that the controller type was different. It's lsisas1068 instead of lsilogic . Is that ok?
ls -l SQL_0-flat.vmdk
-rw------- 1 root root 64424509440 Aug 13 14:08 SQL_0-flat.vmdk
less *.vmx | grep -i virtualdev
scsi0.virtualDev = "lsisas1068"
ethernet0.virtualDev = "vmxnet3"
pciBridge4.virtualDev = "pcieRootPort"
pciBridge5.virtualDev = "pcieRootPort"
pciBridge6.virtualDev = "pcieRootPort"
pciBridge7.virtualDev = "pcieRootPort"
Indeed the file SQL_0-000001.vmdk is missing.
Failed to reconfigure virtual machine SQL. Unable to access file [datastore1] SQL/SQL_0-000001.vmdk
By now I think you will have 4 descriptor-vmdks.
Zip em together and let me check.
I also need the size in bytes or sectors of the 2 flat.vmdks to check your values.
Please read the size via Winscp or Putty - I am not interested in the values displayed by the webinterface.
Assuming you didn't change the state we should be able to get the VM up, and running again. However, due to the out-of-disk issue, and the fact that the CID chain has been broken for "SQL_1", you may experience more, or less data corruption/loss.
Since there's currently only ~42GB disk space available on the datastore, and ~60GB RAM, we need to do some temporary steps. If you do have the option to shut down one (or more) of the other VMs that might be helpful, as it would free up disk space currently used by the VM's swap files.
Here are the steps:
At that point you should be able to power on the VM, and use it again.
Please remember that snapshots are not comparable with backups. In fact, they can consume considerable disk space (as you've just experience). I'd strongly recommend that you do NOT keep snapshots, but rather run daily backups.
If you have question, please feel free to ask.
André
Files to be deleted:
750781508 4096 -rw------- 1 root root 3932672 Wed Aug 14 15:11:02 2019 SQL_0-000001-ctk.vmdk
763364420 8192 -rw------- 1 root root 8192512 Thu Aug 15 19:13:25 2019 SQL_1-000001-ctk.vmdk
16778308 0 -rw-r--r-- 1 root root 0 Fri Aug 16 21:09:15 2019 filelist.txt
369099844 40960 -rw------- 1 root root 41081213 Thu Dec 7 17:46:54 2017 vmmcores-1.gz
394265668 43008 -rw------- 1 root root 43742727 Thu Mar 15 13:38:02 2018 vmmcores-2.gz
364905540 8192 -r-------- 1 root root 8237056 Thu Dec 7 17:46:39 2017 vmx-zdump.000
390071364 8192 -r-------- 1 root root 8122368 Thu Mar 15 13:37:46 2018 vmx-zdump.001
Hi Andre,
Thanks very much for the detailed instructions. I am confused about one point.
When I upload the files from the SQL fixes folder, do I ONLY upload those that are not in the datastore SQL VM?
Both of the following files are already in the SQL VM.
SQL_1.vmdk ( 500GB )
SQL_1-000001.vmdk ( 60.93 )
The two files below are missing.
SQL_0.vmdk
SQL_1-000001.vmdk
Thanks!
Jim
Also, is the step to take a temporary snapshot necessary just in case it fails to take one?
If I get "Failed to create snapshot 'xxx' on virtual machine SQL" can I skip that step or does that mean I have another problem?
Thanks!
Jim
Power On VM
Key
haTask-12-vim.VirtualMachine.powerOn-4449
Description
Power On this virtual machine
Virtual machine
SQL
State
Failed - File system specific implementation of Ioctl[file] failed
Errors
File system specific implementation of Ioctl[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
File system specific implementation of LookupAndOpen[file] failed
The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child
Cannot open the disk '/vmfs/volumes/589ba69b-8042b0c0-14b3-1c98ec129dac/SQL/SQL_1-000001.vmdk' or one of the snapshot disks it depends on.
Module 'Disk' power on failed.
Failed to start the virtual machine.
