VMware Cloud Community
tomjacobchirayi
Contributor
Contributor

Virtual Machine snapshot failed and data loss

Hi,

We had a production Server. The server had more than one virtual disks, which was placed in different datastore. We were using Veeam backup and replication 7, and it was running. A user copied a large amount of data and this resulted in one of the datastore to go out of space and it became full (Since, the virtual machine was running on snapshot). Eventually, the virtual machine also went DOWN requiring user to Answer.

At this point of time, we stopped the Veeam Backup process, and waited a few minutes. Once the backup job was stopped from the Veeam, we restarted the virtual machine. But, the behavior of the virtual machine was a little different.

1. The virtual machine was not showing to be running on snapshot (Checked from Snapshot Manager)

2. There was data loss from the time snapshot started

3. When I checked, I could see some of the delta disks and snapshot files in the datastore (when browsing)

Is there any way to get those data?

Thanks.

Tom Jacob


0 Kudos
5 Replies
nielse
Expert
Expert

Is the date on these files the same as the moment when the backup/snapshot was made?

I think these are just some leftover files which you can only resolve by doing "consolidate virtual machine"...

@nielsengelen - http://foonet.be - VCP4/5
0 Kudos
f10
Expert
Expert

Hi Tom,

When working with snapshots, I dont quite rely on GUI and always use CLI to confirm the details like the latest snapshot file that the .vmx is pointing to and the parent/child relationships. In your case if you didnt see any snapshots in the snapshot manager it may not indicate that the VM does not have snapshots. The information in the snapshot manager is displayed from the *.vmsd file so if this file is corrupt or missing you wont see any information in the snapshot file.

Now about your existing issue, for e.g. if VM was pointing to snap1-00000.vmdk and you have booted the VM using the base disk and there have been writes to the base disk then pointing the VM back to snap1-00000.vmdk would result in data corruption. However if you have not powered on the VM or there haven't been any new writes then you may try to consolidate the snapshots. I am not sure if killing the backup job would have caused this, I haven't used Veem backup but I believe it also use the VM snapshot technology. You best bet would be to look at the *.vmx file to understand the current config and also check the time stamps and snapshots disks to understand which is more recent.

-f10

http://highoncloud.blogspot.in/

About VMware Virtualization on NetApp

Regards, Arun Pandey VCP 3,4,5 | VCAP-DCA | NCDA | HPUX-CSA | http://highoncloud.blogspot.in/ If you found this or other information useful, please consider awarding points for "Correct" or "Helpful".
0 Kudos
a_p_
Leadership
Leadership

First question: Did you power off the VM immediately after you discovered the issue?

To find out what can be done, you need to provide a list of files (with names, sizes, time stamps) in the VM's folder as well as some files. To get all of these files you have to enable SSH on a host and use e.g. WinSCP, because the datastore browser hides some of the files. Using WinSCP, download the VM's .vmx, .vmsd, vmware*.log as well as all the .vmdk header files (the small ones without flat/delta in their names). Then compress/zip the downloaded files and attach the .zip file to a reply post (in order to be able to attach files click "Use advanced editor").

Are there other VM's running on the same datastore?

André

0 Kudos
tomjacobchirayi
Contributor
Contributor

Hi,

My Server is running on ESXi 5.0. I was notified when I got an alert from the monitoring tool indicating that the virtual server was down. When I checked from the ESXi console, the VM was waiting for the answer (indicating that there are no more free space available in the datastore. You will be able to restart the virtual machine by freeing up some of the space from the datastore) with two options - "Retry" and "Cancel". I could only select Cancel at this point of time. After selecting Cancel, the virtual machine was left in 'Powered OFF' state. I consolidated the snapshot during this time. The status of Veeam backup was in 'stopping' state during this time.

After this, I tried turning ON the virtual machine, but I got the prompt that the vmdk files are locked now. I waited for a while for the veeam backup job to stop. After this, I started the Virtual machine and it started.

I did not stop the virtual machine at this point, even after noticing that there was data loss. (I was not able to stop, since it would again cause some of our critical apps to go down, and had to move forward with data loss. we lost the data for the whole period (starting from the time when the Veeam backup took the snapshot of the virtual machine)

Thanks.

0 Kudos
dhanarajramesh

the reason is once u cancel the backup job in VADP proxy based backup tool veem , networker, the vadp would consolidate created snapshots and would cancel the backup jobs.  during this consolidation,  there are  high chances of consolidation failure would happened. check this part in vm tasks and events or in hosts tasks events. if this is the case u have to consolidate all the snapshots manually.

0 Kudos