Good Morning All,
Thanks for taking the time to look!
I have an issue with numerous virtual servers on our server estate.
When VEEAM backs-up the server it takes a snapshot, when the back-up is complete it removes the snapshot, this then causes the server to hang and loose connectivity for anywhere up to 1 hour, thus causing an outage to end users.
I have approached VEEAM and they advised this was HA monitoring, I have disabled this but still the issue occurs.
I have also disabled CBT (Change Block Tracking) in the VEEAM back-up job which requires that all snapshots be removed, the same thing happens.
The odd thing is, this does not occur on every single back-up, almost on a monthly basis.
Any feedback or input would be greatly appreciated.
Hello and welcome to the communities.
What OS/apps are on these VMs?
Are the VMware Tools installed?
Check out http://kb.vmware.com/kb/1013163 for some additional info on this.
I can confirm that VMware tools are installed, and can confirm that they are sitting on Windows server 2008 R2 Enterprise.
I have read the below article, i understand that they reference 64 seconds, but this is more like 64 minutes(!!)
Again, Any help would be appreciated.
I believe this is VMware which is completely responsible for the process of creating and removing snapshot, meanwhile, Veeam is just playing the “requestor” role.
Thus, it stands to reason to reproduce this situation and see whether these issues should be addressed by Veeam or by VMware.
So, you can take the snapshot of your production VMs manually via vSphere client, keep the snapshot open for long enough time before deleting it, similar to time it takes to backup the VM. And then trigger the snapshot commit operation to check if you experience the similar behaviour or not.
I also use Veeam Backup 6.5 specifically and we experience very similiar issues with something classed as a Highly Transactional SQL Server (6CPU,48GB RAM, 2008R2 Enterprise)
I notice during the snapshot removal 95% that the SQL VM would freeze - this is very very very bad.
I Have even created a Snapshot Working LUN to store the Delta snapshot but it has made no differemce
I also noticed this happen when I created a snapshot of my vCenter server it ran very slow and then my pings to the server timed out and then the VM stopped all RDP connections to it - few seconds later the server came back
Is it normal for pings to stop to a VM being snapshot or snapshot removal happening in progress.
Also read a little on Stun times on a vm log but no real solution fixed
Using ESXi5.0.2 -
2 x Dell Equallogic 6510X SAN (96 x 600GB Disks)
ISCSI10GB SAN Switch, setup for jumbo Frames
Dell MEM Plugin 1.0.2
Backing up to VMFS as Veeam doesnt backup RDM or
ISCSI Direct connect LUNS
The SQL server is where we need performance and 24/7 Operation but like mentioned i seen to have issues with Timeouts to my lower resource VM's too
Hopefully someone can suggest a fix
As have already mentioned, usually, the problem of snapshot removal isn’t related to Veeam side, but to VMware, instead.
So, as the first step you can follow the manual procedure described above and see whether you experience similar issue or not.
There is also an existing thread at Veeam Community Forum; might be worth reviewing: http://forums.veeam.com/viewtopic.php?f=24&t=2716&p=79491
We are having the SAME exact issue here. Server is a 08R2 SQL server with 16gigs of RAM and 8vCPUs. This is our most mission critical server, so we are doing Veeam backups every 3 hours. Not every backup causes it. Its more like 1 in 4 causes this. The VM will lockup for anywhere to 1 minute to 15 minutes, which is obviously a HUGE problem with 100+ active users. Please let me know if I can help get to the bottom of this.
I think it is because you are created a snapshot which is include the virtual machine's memory. Can you try to create a snapshot without the memory (Try directly from your vCenter)?
I am surprised Veeam blamed this on HA monitoring as this has nothing to do with snapshots, CBT, etc. As suggested in other posts, this is probably due to the VM being very busy from an I/O perspective; thus, it is taking a very long time to remove (commit) the snapshot created by a backup job. VMware is constantly looking for ways to improve VM snapshots. In the meantime, I suggest an agent-based approach for backing up very busy VMs like SQL Server. Many backup vendors tout the fact that they are 100% agent-less. This can be a shortcoming, as we've seen in this thread. Agent-less is fine for the large majority of VMs out there, but certain workloads are best backed up with an agent. Take a look at vSphere Data Protection Advanced, which performs VM backups without the need for an agent, but also includes agents for those exceptional workloads such as SQL Server and Exchange.