VMware Cloud Community
jdtwood
Contributor
Contributor

Help - VM Crashed and not sure if recoverable.

Hello,

We are using VMWARE ESXi and have a VM that just crashed. 

If you look at the attached screenshot of our VMWARE setup, you'll notice I have opened up DATASTORE1.  Inside the SQL VM you will see the following files.

SQL_1.vmdk is 500GB.  This is where our SQLVM DB is located.  The missing file is SQL_0.vmdk which was about 3.48 GB and that contains our Boot disk.  Without the boot disk we cannot boot into the VM and extract anything from SQL_1.vmdk.

Client version:  1.23.0

Client build number:  6506686

ESXi version: 6.5.0

ESXi build number:  7388607

Are we out of luck here?  Is it possible to plug in another SSD (1TB) and create a new Datastore to create a new VM and move the SQL_1.vmdk after a new Boot disk is created?  Would that work?

I thought I would get on the forums here and ask a guru until we can get our support connect with VMWARE direct.  THANK YOU for any and all replies.

Jim

Reply
0 Kudos
25 Replies
jdtwood
Contributor
Contributor

Note that our company did have a backup but it was two days old and we are looking to extract data from the SQL VM that crashed to hopefully get the last two days.  OR just get it going again.

Jim

Reply
0 Kudos
continuum
Immortal
Immortal

Hi Jim

I may be able to help you. Please read

Create a VMFS-Header-dump using an ESXi-Host in production | VM-Sickbay
If you create such a dump I can check wether the vmdk is recoverable.

With ESXi 6.5 the chances are hit or miss.

Provide a downloadlink for the dump and call me via skype if you want me to look into it asap.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
a_p_
Leadership
Leadership

Welcome to the Community,

to me it looks like only some metadata file got lost, which could possible be recreated.

Please enable SSH on the host, connect to it via e.g. putty, and run the following commands:

cd /vmfs/volumes/datastore1/SQL

ls -elisa > filelist.txt

Then use the datastore browser, and download filelist.txt, SQL.vmx, and vmware.log from the VM's folder. Compress/zip these three files, and attach the .zip archive to a reply post.


André

Reply
0 Kudos
jdtwood
Contributor
Contributor

Thank you VERY much!!  I am following your instructions now and will get you the sample asap.  Much appreciated.

Jim Atwood

Reply
0 Kudos
jdtwood
Contributor
Contributor

Thanks Andre, I am getting you the data requested for you to have a look.  I greatly appreciate your reply, time, and expertise.  Both of you gentleman. 

Regards,

Jim Atwood

Reply
0 Kudos
continuum
Immortal
Immortal

Plan A: work with Andre

Plan B: if that fails follow me


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
jdtwood
Contributor
Contributor

Hi Andre,

Attached are the requested files.  I found out that the cause for our crash was that we ran out of disk space.

The SQL VM is still maxed out so we ( if correct ) will have to create a new VM with a 1TB disk that we are adding to first increase the space.  Then hopefully move or resize the SQL Vm to give it more space.  The issue of the Boot disk is what we are facing now and not sure whether we can even recover this or not.

If you need additional info, please let me know.  Thank you!

Regards,

Jim Atwood 

Reply
0 Kudos
a_p_
Leadership
Leadership

I found out that the cause for our crash was that we ran out of disk space.

That kind of explains it. However, this shouldn't case files being deleted. Definitely a bug.

Anyway, some questions regarding the data you sent.

  • Did you try to attach SQL1-vmdk to another VM? It has a later time stamp as its snapshot!? In this case we may need to fix the snapshot chain. To do/verify this, use e.g. WinSCP to download the two small descriptor files SQL_1.vmdk, and SQL_1-000001.vmdk.
  • How much free disk space do you currently have on the datastore?
  • How much free memory/RAM do you currently have on the ESXi host?
  • Do you need the existing snapshots (~65GB), which have likely been created two days ago?

André

Reply
0 Kudos
jdtwood
Contributor
Contributor

Hi André,

First, thank you so much for responding and taking time out of your likely very busy day.  I greatly appreciate it.  Below I provided the answers to your questions along with some additional data screenshots. 

Did you try to attach SQL1-vmdk to another VM?

No, we didn't have disk space and are trying to get more in there.

How much free disk space do you currently have on the datastore?

Not much to recover, but it's listed on the screenshot.

How much free memory/RAM do you currently have on the ESXi host?

On Screenshot

Do you need the existing snapshots (~65GB), which have likely been created two days ago?

Not sure actually.  Our last set of data stopped on 8/12.  We lost 8/13 and 8/14 data.  8/15 we processed manually.  Our last backup was the early morning of 8/13 which covered up to 8/12.

Thus I am not sure if the snap shot would have any new data. 

Reply
0 Kudos
continuum
Immortal
Immortal

Hi

Must have been to distracted first time I read your post - but the solution was in your first screenshot:

Datastorebrowser shows the"SQL_0-flat.vmdk"  vmdk - this means the descriptor is missing or has a major syntax-error.

And indeed - it is not present.

Creating one from scratch is easy - but we need the descriptor vmdk SQL_0-000001.vmdk to read the parameters : CID and size in sectors.

So please attach SQL_0-000001.vmdk


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
continuum
Immortal
Immortal

You should be able to do it yourself:

Edit this text and store it as SQL_0.vmdk

------------------------------------------------------------

Three places to edit are marked in red ....

# Disk DescriptorFile

version=1

encoding="UTF-8"

CID= edit this value - it is the value for parentCID in the descriptor of the snapshot

parentCID=ffffffff

isNativeSnapshot="no"

createType="vmfs"

# Extent description

RW size-in-sectors-edit-here VMFS "SQL_0-flat.vmdk"

# The Disk Data Base

#DDB

ddb.adapterType = "lsilogic"

ddb.geometry.cylinders = "edit this value - it is size in sectors divided by 16065 rounded down"

ddb.geometry.heads = "255"

ddb.geometry.sectors = "63"

ddb.virtualHWVersion = "13"

------------------------------------------------------------------


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
jdtwood
Contributor
Contributor

Hi there,

Thanks for the reply.

The file that you are requesting "SQL_0-000001.vmdk" is the one unfortunately that is missing from the Datastore1.  It somehow disappeared after the crash.

I will try to work through your instructions to create one from scratch.

Jim

Reply
0 Kudos
jdtwood
Contributor
Contributor

I ran the following command and noticed that the controller type was different.  It's lsisas1068 instead of lsilogic .  Is that ok?

ls -l SQL_0-flat.vmdk

-rw-------    1 root     root     64424509440 Aug 13 14:08 SQL_0-flat.vmdk

less *.vmx | grep -i virtualdev

scsi0.virtualDev = "lsisas1068"

ethernet0.virtualDev = "vmxnet3"

pciBridge4.virtualDev = "pcieRootPort"

pciBridge5.virtualDev = "pcieRootPort"

pciBridge6.virtualDev = "pcieRootPort"

pciBridge7.virtualDev = "pcieRootPort"

Reply
0 Kudos
jdtwood
Contributor
Contributor

Indeed the file SQL_0-000001.vmdk is missing.

Failed to reconfigure virtual machine SQL. Unable to access file [datastore1] SQL/SQL_0-000001.vmdk

Reply
0 Kudos
continuum
Immortal
Immortal

By now I think you will have 4 descriptor-vmdks.

Zip em together and let me check.

I also need the size in bytes or sectors of the 2 flat.vmdks to check your values.

Please read the size via Winscp or Putty - I am not interested in the values displayed by the webinterface.


________________________________________________
Do you need support with a VMFS recovery problem ? - send a message via skype "sanbarrow"
I do not support Workstation 16 at this time ...

Reply
0 Kudos
a_p_
Leadership
Leadership

Assuming you didn't change the state we should be able to get the VM up, and running again. However, due to the out-of-disk issue, and the fact that the CID chain has been broken for "SQL_1", you may experience more, or less data corruption/loss.

Since there's currently only ~42GB disk space available on the datastore, and ~60GB RAM, we need to do some temporary steps. If you do have the option to shut down one (or more) of the other VMs that might be helpful, as it would free up disk space currently used by the VM's swap files.

Here are the steps:

  1. shut down other VMs on that datastore if possible
  2. delete the files listed below (they will automatically be recreated if needed)
  3. extract the files from the attached .zip archive, and upload the .vmdk files to the VM's folder on the datastore
  4. reload the VM from the command line (see steps 2+3 from https://kb.vmware.com/s/article/1026043​)
  5. create another, temporary snapshot to preserve the VMs current state, and allow to revert to it in case it's necessary
  6. edit the VM's settings
    - reserve memory in the VM's memory configuration, i.e. enable "Reserve all guest memory (All locked)" to reduce the VM's swap file size
    - if you cannot reserve all memory, then either reserve at least half of it, or temporarily reduce the VM's assigned memory
    - (optional, unrelated to the issue) remove the second floppy disk drive if you don't need it
  7. power on the VM, check your data, and then shut it down again (make sure that there will be not much data change while the VM is powered on)
  8. backup the VM !!!
  9. From the VM's Snapshot Manager select "Delete All" (this may take some time due to the snapshot size of ~65GB)
    Once the snapshots are gone, you should have >100GB free disk space
  10. remove the VM's memory reservation that has been set in a previous step

At that point you should be able to power on the VM, and use it again.

Please remember that snapshots are not comparable with backups. In fact, they can consume considerable disk space (as you've just experience). I'd strongly recommend that you do NOT keep snapshots, but rather run daily backups.

If you have question, please feel free to ask.

André

Files to be deleted:

750781508   4096 -rw-------    1 root     root       3932672 Wed Aug 14 15:11:02 2019 SQL_0-000001-ctk.vmdk

763364420   8192 -rw-------    1 root     root       8192512 Thu Aug 15 19:13:25 2019 SQL_1-000001-ctk.vmdk

16778308       0 -rw-r--r--    1 root     root             0 Fri Aug 16 21:09:15 2019 filelist.txt

369099844  40960 -rw-------    1 root     root      41081213 Thu Dec  7 17:46:54 2017 vmmcores-1.gz

394265668  43008 -rw-------    1 root     root      43742727 Thu Mar 15 13:38:02 2018 vmmcores-2.gz

364905540   8192 -r--------    1 root     root       8237056 Thu Dec  7 17:46:39 2017 vmx-zdump.000

390071364   8192 -r--------    1 root     root       8122368 Thu Mar 15 13:37:46 2018 vmx-zdump.001

Reply
0 Kudos
jdtwood
Contributor
Contributor

Hi Andre,

Thanks very much for the detailed instructions.  I am confused about one point.

When I upload the files from the SQL fixes folder, do I ONLY upload those that are not in the datastore SQL VM?

Both of the following files are already in the SQL VM.

SQL_1.vmdk ( 500GB )

SQL_1-000001.vmdk ( 60.93 )

The two files below are missing.

SQL_0.vmdk

SQL_1-000001.vmdk

Thanks!

Jim

Reply
0 Kudos
jdtwood
Contributor
Contributor

Also, is the step to take a temporary snapshot necessary just in case it fails to take one?

If I get "Failed to create snapshot 'xxx' on virtual machine SQL" can I skip that step or does that mean I have another problem?

Thanks!

Jim

Reply
0 Kudos
jdtwood
Contributor
Contributor

Power On VM

Key

haTask-12-vim.VirtualMachine.powerOn-4449

Description

Power On this virtual machine

Virtual machine

SQL

State

Failed - File system specific implementation of Ioctl[file] failed

Errors

File system specific implementation of Ioctl[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

File system specific implementation of LookupAndOpen[file] failed

The parent virtual disk has been modified since the child was created. The content ID of the parent virtual disk does not match the corresponding parent content ID in the child

Cannot open the disk '/vmfs/volumes/589ba69b-8042b0c0-14b3-1c98ec129dac/SQL/SQL_1-000001.vmdk' or one of the snapshot disks it depends on.

Module 'Disk' power on failed.

Failed to start the virtual machine.

Reply
0 Kudos