Hello,
I know this post might go unanswered but it has been a very long 28 hour day already and I am becoming quite worried and frustrated.
I have a VM with 100GB disk, 200GB provisioned size on an ESXI 5 server. It had one snapshot which had a 16GB size. I, now know, made the mistake of letting this snapshot run for 2 months. When I tried to commit this snapshot to the parent disk, this action is now going on for 16+ hours.
There is disk activity, the log get written every so often, and I see that it is going though a cycle of writing to two files:
000001-delta.vmdk and 000002-delta.vmdk
Around 1.5GB gets written on to number one, then it goes to number 2 and does it all over again, it has done this cycle at least 3 times already. The progress bar on teh vSphere Client shows 85% completed. It gains a few % points on every cycle.
Some more information, there is high CPU usage, has been pretty high since the process started. I had to shutdown two other VMs for the night because they were being affected by the high CPU usage. There is enough space on the HD, around 140GB free.
Here are some of the last lines written to the log:
Any advice will be greatly appreciated. Let me know if anyone has gone through this ordeal and how they got out of this mess.
Thank you in advance.
GPB
A small update...
Just noticed there is a file called [VM name]Snapshot75.vmsn on the VM folder. Earlier today there was a similar file but it was numbered 72. Not sure if there were others before with smaller numbers.
There was only one snapshot for this VM, so where is it getting these numbers from?
GPB
Another thing I have noticed is that the recent task window shows, every so often, a "Manipulate file paths" task which was completed successfully.
I have seen this show up at least 6 or 7 times now.
Still going (just like the Energizer Bunny). It´s going to get slower now that users started to come in to work and I had to enable the other VMs.
Lessons learned out of all this mess, NEVER EVER leave snapshots for too long and do not rely on them too much since they can be a bit unstable.
Has anyone experienced this before? I am really amazed at how long this is taking.
By the way the VM in question is the mail server, 10 hour downtime yesterday and looking at another 12 today. Good thing it is Friday!!!
GPB
Believe it or not, it is still going. I already brough back up a copy of the server so email is back on but I just want to see the end of this 25.5 hours and 88%!!!
This cannot be normal. 16GB snapshot, 100GB parent. What is going on?! :smileyshocked:
Last update....42 hours later the consolidation process finished and I am glad to say the VM works normally.
Very strange it took this long, it is a less than a year old Proliant DL360 G7 server with two Intel Xeon 2.4Ghz processors and 24GB RAM, 15K rpm HDs, running 6 VMs.
Word to the wise, if you use snapshots commit them soon...or it could be a very long and painful consolidation process.
GPB
I agree, thats why vmware recommends that not to have many snapshots or commit/delete before it grows very old,
I can definately understand your mindset , i had a similar case where a snapshot was almost 1 year old, and it took almost 48 hrs to delete, luckily for me the it didnt impact to the VM.
Although I agree that snapshots should not be kept longer than really needed, the time needed to delete a 16GB snapshot should not be that long. Even without BBWC/FBWC (which makes a HUGE difference in disk performance) it should not take longer than ~1hour to complete this job.
For HP hosts there are basically 2 major things you need for good performance. The first one is BBWC/FBWC for disk performance and the second one is to set "Static High Performance" in the BIOS Power settings for processing performance.
André
Thats true but I never figured out why it took so long for me (I was using Dell M910 and NetApp Storage). I didn't want to cancel as it was my Exchange Server and i was afraid it might cause some impact
I am now going thru this dreadful experience, I started my consolidation of VM snapshots yesterday at 6pm and now it is 1:30pm the next day, progress bar still shows 99%. I do not want to stop the process that could result in a corrupted file. This just bugs me because this is a common function on VMware and now I look like an idiot in front of our customer.
I logged onto the ESX host and changed directories to volume\lun\vm and I see the second hard drive always updated with the current time when I do an ls -l command. This is the hard disk I need to get to in order to restore our customers database, but everytime I try to manipulate the .vmdk file it says its locked.
Its going on 24 hours now, how long should this take. I just started but I think they had snapshots going back a couple of months.