Has anyone else experienced problems committing snapshots, where during the commit process the VM gets paused (I/O pause I would assume) for long enough to interrupt clients' communicating with the VM?
We just tried to commit a ~50GB snapshot (about a week old), and several times during the process the VM paused for long enough to disconnect file shares, disconnect client applications that rely on daemons running in the VM, etc.
Cheers.
^ In my experience that's caused due to a lack of service console memory. Try upping your SC memory to 768MB or something. Good luck!
Hello,
Short time outs are to be expected when initiating and commiting the snapshot.
The larger delta file is the longer the time out will be.
Snapshots should only be used for a very short time.
e.g
Backing the parent files and config and then commit the snapshot.
You do not want them to grow, they will significantly degrade performance.
Snapshots are not backups, they are deltas of disk change and will significantly increase the disk I/O load of the storage subsystem.
Correct snapshots are NOT backups - you don't want to run on your snaps for very long. The Virtual Center Console operation will likely timeout, just keep watching the Datastore for the snaps to go away, then you know your good.
I think the original poster is having other issues than just the command timing out though, he loses connectivity to his guest VMs which is not good. I still suggest upping the Console memory.
Hello,
The loss of connectivity is due to the temp delta file commit time. When the snapshot is commited it creates a temp disk and mem file that holds all the changes since the delta vmdk started writing out to the parent vmdk. When the temp files are commited the VM will not respond until it's complete.
You are correct in adding memory to the COS resource pool it will help speed up the process and reduce the negative impact.
Increasing the service console memory as has been suggested is definitely one good recommendation. Also what version and build of ESX are you running. I know there has been a number of snapshot related patches released by VMware for all of the 3.x versions, including some that addressed VMs losing connectivity for large snapshot commits.
This is kb article for a 3.5 patch that was released in May
In my experience that's caused due to a lack of service console memory. Try upping your SC memory to 768MB or something. Good luck!
You are correct in adding memory to the COS resource pool it will help speed up the process and reduce the negative impact.
Increasing the service console memory as has been suggested is definitely one good recommendation
Thanks guys, will try that. Is there a "recommended" amount? boostedevo suggested 768mb, a few docs on the net suggested 512mb for VI 3.x.
Short time outs are to be expected when initiating and commiting the snapshot.
How short are we talking here though? I wouldn't have expected it to interrupt client <-> server communication to the extent that file shares disconnect, etc.
Snapshots should only be used for a very short time.
Correct snapshots are NOT backups - you don't want to run on your snaps for very long.
Yeah, understand that, unfortunately this was one we'd forgotten about
Also what version and build of ESX are you running.
3.5 Update 2, all patches.
Cheers,
Matt Kilham
Just to add to the thread... I had the same issue. ESX 3.5 Update 4 with all patches up to 30/07/2009.
In our case, the network drop happens more often when creating snapshots for Virtual Citrix servers. Less busier servers don't seem to experience this.