VMware Cloud Community
_usr_local_dick
Contributor
Contributor

Snapshots of Ubuntu Lucid VMs take too long - VCB kills them

Hi guys

I have recently upgraded our VMware setup. It now consists of two ESX4i 4.1 hosts and Vcenter 4.1.

Since the upgrade snapshots of Ubuntu Lucid (10.04) VMs take much longer.

This surfaced because we use VCB to do backups, and the snapshot would time-out, and the VM would become stuck in an unusable state.

Manually snapshotting does work, but it takes 4-5 minutes, which is too long for VCB so it will time-out and leave the VM stuck.

I also tried doing the snapshot directly on the ESX host (instead of from Vcenter), this also takes 4-5 minutes, I guess this rules out any Vcenter firewall/port issues.

Snapshotting other VMs goes a lot faster (basically the same as before the upgrade), both Ubuntu 8.04, Windows 2003, and Windows 2008 VMs take about 20-40 seconds, depending on the amount of RAM. These VMs therefore get backed up fine by VCB.

I am using the distro provider tools in Ubuntu 10.04 LTS. This release finally had some good mechanisme to recompile open-vm-tools upon kernel upgrades, so I was very happy with that. At the moment, the tools are:

visser@lucid:~$ apt-show-versions | grep open-vm

open-vm-dkms/lucid uptodate 2010.02.23-236320-1+ubuntu1

open-vm-tools/lucid uptodate 2010.02.23-236320-1+ubuntu1

FYI, our VMs have either 1 or 2GB of RAM.

I want to find out why snapshots take so long, where should I look.

In the meantime I would also like to increase the timeout that VCB uses for taking snapshots.

I have now removed the VMs in question from my backup regime, because backups don't work anyway, and at least the VMs stay alive.

But no backup is no option, so this is a serious issue.

I will also ask around on the open-vm-tools mailinglist.

0 Kudos
8 Replies
potentialgenius
Contributor
Contributor

If you are using the snapshots for backup purposes have you tried other options - such as using backup software from within the VM to a SAN to see if there are still problems? Are you 100% sure it is a problem with ESXi's backup procedures?

Maybe creating snapshots from the (unsupported) shell in ESXi will allow you to monitor their progress more closely. I'd also try to rule out problems with the VM's configuration.

0 Kudos
_usr_local_dick
Contributor
Contributor

We use VCB on a host connected to the SAN. This software uses Vcenter to make a snapshot of a VM, then pull a copy of the VM through the SAN, then remove the snapshot again.

This has worked fine until we upgrade to 4.1.

I have tried changing the SCSI HBA from VMware Paravirtualized to LSI Logic Parallel, but that did not make any significant difference.

I'd rather not change the way we make backups, because basically it has always been worked great...

Will try to do a manual snapshot from the ESX host, and see that is logged there.

0 Kudos
_usr_local_dick
Contributor
Contributor

The snapshot creation still takes 3:36 with the VM in question (Ubuntu 64 bit 10.04, 2 GB RAM, 1 CPU):

~ # time vim-cmd vmsvc/snapshot.create 160 Test_Snap Blah 1

Create Snapshot:

real 3m 36.19s

user 0m 0.66s

sys 0m 0.00s

If I stop the vmtools in the guest, and unload all the vm modules, still the same.

For comparison, snapshotting an Ubuntu 8.04 32 bit VM with 2 GB RAM takes 3 times less time:

~ # time vim-cmd vmsvc/snapshot.create 96 Test_Snap Blah 1

Create Snapshot:

real 1m 11.60s

user 0m 0.35s

sys 0m 0.00s

Maybe it has to do with 64 bit VMs? I tried a Windows 2008R2 VM, 64bit, also 2 GB RAM:

~ # time vim-cmd vmsvc/snapshot.create 144 Test_Snap Blah 1

Create Snapshot:

real 3m 25.72s

user 0m 0.63s

sys 0m 0.00s

Maybe this is the issue? I will do some more benchmarking with my VMs.

In the meantime, I could not find any commandline options for vcbmounter.exe to increase snapshot time out.

0 Kudos
potentialgenius
Contributor
Contributor

That benchmarking could be very useful, could you also tell us the hardware specifications of the server in question? Any relevant BIOS options, architecture, CPU, RAM, etc? The 64-bit part of your last reply made me wonder if perhaps architecture is the issue - I haven't attempted snapshotting a 64-bit VM so maybe I will give that a go and let you know if I see any speed issues myself.

Edit:

Also I found one article from the KB and another - not snapshot time out but maybe these will help? Also make sure all previous snapshots have been deleted.

0 Kudos
_usr_local_dick
Contributor
Contributor

Some more info, my Ubuntu Lucid 64 bit VMs will go offline when the VCB snapshot times out, and there are load of errors like:

task xxx:123 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.

See attached console pic.

0 Kudos
J1mbo
Virtuoso
Virtuoso

It seems to me there is an issue in the latest version of the vmware tools affecting some operating systems. Win2k3 is affected, WinXP and Win7 are not, for example. Win2k3 snapshot creation, with guest filesystem quiesce selected (but not RAM) takes 2s on 4.0 tools, 30 - 60s on 4.1.

http://blog.peacon.co.uk

Please award points to any useful answer.

0 Kudos
_usr_local_dick
Contributor
Contributor

I've narrowed it down to an issue with the "quiesced" parameter. It

does not matter if the RAM is snapshotted, so I'll leave that out.

Creating a snapshot without quiescing works fine:

~ # time vim-cmd vmsvc/snapshot.create 128 TestSnap Blah 0 0

Create Snapshot:

real 0m 2.07s

user 0m 0.20s

sys 0m 0.00s

However, turning on "quiesced" will fail:

~ # time vim-cmd vmsvc/snapshot.create 128 TestSnap Blah 0 1

Create Snapshot:

Create snapshot failed

real 0m 16.29s

user 0m 0.23s

sys 0m 0.00s

After this the VM will become unresponsive and the console displays

lots of messages like this.

task xxx:123 blocked for more than 120 seconds.

"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.

I will install a VM with the supported tools to check that it works there.

I'll file a bug report for Ubuntu's open-vm-tools package.

0 Kudos
_usr_local_dick
Contributor
Contributor

I tried the official supported tar installer, and there are no problems with that.

I files a bug report with Ubuntu: https://bugs.launchpad.net/ubuntu/source/open-vm-tools/bug/611644

0 Kudos