VMware Cloud Community
NvR4GeT
Enthusiast
Enthusiast

VDP Fails with Error E10052 - Seems VMs freeze while VDP backs up.

Hello Everyone,

     So, I'm working with VDP and trying to backup 7 vms. I don't have this issue at any other client of ours. I have a client that the VDP runs at the default time set and fails to backup all the servers. I have yet to have it backup every server without any issues.

The error in vSphere Web Client tells me the following:

VDP: Backup job (*****) failed to backup client ******. Execution error: E10052:Failed to create snapshot.. Job ID: ******************** (Full Client path: )

After browsing around I was able to finally find the avamarclient logs and received a bit more information on the root issue here:

2013-03-29 20:00:42 avvcbimage Info <16001>: Found 2 disk(s), 0 snapshots, and 0 snapshot ctk files, on the VMs datastore.
2013-03-29 20:00:42 avvcbimage Info <9692>: a VM snapshot has been requested
2013-03-29 20:00:42 avvcbimage Info <14627>: Creating snapshot 'VDP-13646016429e35cb6f7d88831d272f231fb5dce2f44dba91f0', quieceF                      S=1
2013-03-29 20:00:42 avvcbimage Info <14631>: create snapshot task still in progress, sleep for 2 sec.
2013-03-29 20:01:05 avvcbimage Warning <16004>: Soap fault detected, Query problem, Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no sub                      code]
"Connection timed out"
Detail: connect failed in tcp_connect()
'
2013-03-29 20:01:05 avvcbimage Error <14628>: Create snapshot failed for snapshot 'VDP-13646016429e35cb6f7d88831d272f231fb5dce2f                      44dba91f0'.
2013-03-29 20:01:05 avvcbimage FATAL <14688>: The VMX '[ESXI01_Local_Volume_0] *******/*******.vmx' could not be snapshot.
2013-03-29 20:01:05 avvcbimage Info <0000>: Starting graceful (staged) termination, Create Snapshot failure. (wrap-up stage)
2013-03-29 20:01:05 avvcbimage Error <9759>: createSnapshot: snapshot creation failed
2013-03-29 20:01:05 avvcbimage Info <14696>: snapshot created:false NOMC:false ChangeBlTrackingAvail:true UsingChBl:true, FullBa                      ckup=false
2013-03-29 20:01:05 avvcbimage Info <9666>: Available transport modes are file:san:hotadd:nbdssl:nbd
2013-03-29 20:01:05 avvcbimage Info <9667>: Calling ConnectEx with servername=***********************:443 vmxspec=moref=vm-25 on                       port 0 snapshot()
2013-03-29 20:01:05 avvcbimage Info <9668>: virtual machine will be connected readonly
2013-03-29 20:01:05 avvcbimage Info <9669>: VixDiskLib_ConnectEx returned VIX_OK
2013-03-29 20:01:05 avvcbimage Info <9672>: Disconnected from VM
2013-03-29 20:01:05 avvcbimage Info <16038>: Final summary, cancelled/aborted 0, snapview 2, exitcode 202: plugin error 02

After the VDP Job fails. I notice that the snap is actually there, as I check the Snapshot Manager and see the "VDP-Job###" listed.

So, when I rerun the job manually it goes ahead and removes the VDP-Snapshot. Then begins to re-snap the VM. I know these VMs lock up for less than a minute when VDP runs. I feel as if when Virtual Center is being snapped it freezes the I/O on that VM and makes VDP lose it's connection to vCenter Server and then fails the VDP backup. So, we thought to get another set of 1TB Drives and put them in a RAID-1 Configuration and create another datastore to strictly be dedicated for VDP backups.

Even after this implementation we still are seeing the issue.

Does anyone have any suggestions as to what to try to get VDP working properly. Also, I have redeployed mutliple times to ensure it wasn't corruption in VDP itself.

Thank you to anyone who assists :smileygrin:!

17 Replies
NvR4GeT
Enthusiast
Enthusiast

Also, I can occassionaly manually run a Backup Job and it successfully run. But, usually when I select all the backup jobs it will fail.

0 Kudos
65_alessandro
Contributor
Contributor

Hi,

I've the same results and I've tried to check with a ping all the connections to the Virtual Machine during the scheduled job but when the job started I've lost the RDP connection with the Virtual Center Server. I suppose there can be two possible causes for this issue: increase the settings for the "time out" or increase the resources of the VC because when the job start the VC is freezed for some seconds.

The question is if is possible to configure a parameter fot the time out connection.

I've also opened a ticket to the support but for now (after three months) no results has been gained.

Bye

Alessandro

0 Kudos
NvR4GeT
Enthusiast
Enthusiast

Keep me posted on their response if they do ever come up with something. What does your licensing look like? Do you have Essentials Plus or above? My understanding is that VDP is not licensed for Essentials so was curious to see what you have licensed. Also, I agree that would be nice to edit the timeout for the connection to vCenter as I feel as if this is the culprit. When VDP begins backing up, the VM Freezes. Which then the timeout exceeds and fails the backup.

0 Kudos
65_alessandro
Contributor
Contributor

Hi Edinburgh,

I'm sorry but not, this is the error:

2013-04-30 09:00:13 avvcbimage Info <14627>: Creating snapshot 'VDP-136730521339d61286c9f23678eebdbbec5a7a1cab7a7fde20', quieceFS=1
2013-04-30 09:00:13 avvcbimage Info <14631>: create snapshot task still in progress, sleep for 2 sec.
2013-04-30 09:00:15 avvcbimage Info <14631>: create snapshot task still in progress, sleep for 2 sec.
2013-04-30 09:00:17 avvcbimage Info <14631>: create snapshot task still in progress, sleep for 2 sec.
2013-04-30 09:00:40 avvcbimage Warning <16004>: Soap fault detected, Query problem, Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]
"Connection timed out"
Detail: connect failed in tcp_connect()
'
2013-04-30 09:00:40 avvcbimage Error <14628>: Create snapshot failed for snapshot 'VDP-136730521339d61286c9f23678eebdbbec5a7a1cab7a7fde20'.
2013-04-30 09:00:40 avvcbimage FATAL <14688>: The VMX '[PVMDSBE1] VC/VC.vmx' could not be snapshot.
2013-04-30 09:00:40 avvcbimage Info <0000>: Starting graceful (staged) termination, Create Snapshot failure. (wrap-up stage)
2013-04-30 09:00:40 avvcbimage Error <9759>: createSnapshot: snapshot creation failed

as can you see the error comes after few second from the backup start for this reason I've asked to the support if there is a parameter that sets this value. After three months I don't have response!!!

0 Kudos
edinburgh1874
Enthusiast
Enthusiast

I thought this might help as it seems to relate to the timeout value for VDP processes...

Maybe you could identify which host the backup is running on at that time, and look at /var/log/hostd.log

Perhaps that might shed some light on what's happening at the ESXi level.

Cheers

0 Kudos
arjanhs
Enthusiast
Enthusiast

I'm having the same problems, have you already solved the problem?

0 Kudos
NvR4GeT
Enthusiast
Enthusiast

Mysteriously enough, one of our client's VDP is now working 100%. The above issue was occurring. Our assumption was since they didn't purchase SAS Drives for their storage HDs and instead had either Near Line SAS or even SATA drives. We bought another set of drives and created a separate RAID Volume and installed VDP on that volume. This way the I/O could be separated. I installed VDP on the new volumes and attempted backups, but for some weird reason it didn't work and still showed the same issue. Some time went by before getting back to it and all the VDP jobs were not linked anymore to the VMs. I re-linked all the VDP jobs and the next time VDP ran the jobs everything was successful. I'm unsure how the link was broken, but that might have been the key to the successful backups now. I have been getting good VDP backups for over a month and a half now. Hopefully this may help with your situation.

0 Kudos
arjanhs
Enthusiast
Enthusiast

What do you mean with re-linking al the VDP jobs, You mean recreating them, or selecting the VMs within the job?

0 Kudos
NvR4GeT
Enthusiast
Enthusiast

Firstly, I setup individual jobs for each VM. But, when I logged in, I had an error that it did not know the location of the VMs as to where the vmdks resided. I had to re-point to where the VMs were in the Datastore. After that, all VDP jobs passed and have continued to pass. I'm unsure how it lost the location of the VMs. I'm presuming you could move the vmdks and make VDP lose the location on purpose and then move them back and have it find them again?

0 Kudos
arjanhs
Enthusiast
Enthusiast

I have checked the jobs and none of them have lost the location to the VMs. The VMs are production machines and are in daily use, moving the vmdk isn't an option. I have filled a SR last week for this problem, but haven't received anything yet. Hopefully VMware comes up with an solution.

0 Kudos
arjanhs
Enthusiast
Enthusiast

I was able to solve the problem for this particular environment. The problem was the creation of the snapshot, while creating the snapshot the virtual machine (a DNS server) wasn't able to respond to requests. And the backup failed. After adding a secondary DNS server and pointing the vDP appliance to use this DNS server, the snapshots went well and the backup finished successfully.

LMW
Contributor
Contributor

I encountered the same thing in my logs complaining of a SOAP timeout and that it couldn't create the snapshot.  This only occurred for ONE of my EIGHT Server2008R2 VMs.  I looked around on the VDP for some way in the command line / script that was being run to increase the timeout.  The one KB article seemed like a possibility: VMware KB: vSphere Data Protection backup jobs fail intermittently however it did not have an impact.

The one piece of info I learned from just looking was that the creation of the snapshot for this ONE VM took > 60 seconds.  So I went down the other route of trying to "speed up" the snapshot.  I somehow ended up on this KB article VMware KB: Backing up a Windows Server 2008 R2 or Windows Server 2012 virtual machine using vSphere ... and when I made the  change in this KB, my snapshot took ~40 seconds to take and my backup completed successfully. 

I hope this helps someone...

-Mat

K_Faisal
Enthusiast
Enthusiast

Thank you Mr

Thank You.
0 Kudos
raymos
Contributor
Contributor

Hello everyone,

I am experiencing this same issue with right after upgrading to VDP 6.0.

Error looks like this:

2015-07-16T20:00:42.238+04:00 avvcbimage Info <14631>: Snapshot 'VDP-14370912232e6b7b31e31f69b956ad1f2809f7986b727be2da' creation for VM 'blah_blah_blah.vmx' task still in progress, sleep for 2 sec

2015-07-16T20:01:05.245+04:00 avvcbimage Warning <16004>: Soap fault detected, Query problem, Msg:'SOAP 1.1 fault: SOAP-ENV:Client [no subcode]

"Connection timed out"

Detail: connect failed in tcp_connect()

'

2015-07-16T20:01:05.267+04:00 avvcbimage Error <17773>: Snapshot 'VDP-14370912232e6b7b31e31f69b956ad1f2809f7986b727be2da' creation for VM 'blah_blah_blah.vmx' task failed to start

What I find strange, is that I will perform manual backups & then FOR ONE DAY, the backups work fine. 

After that, various and sundry machines start to fail their backups.  There is no rhyme or reason.

My environment has multiple AD controllers & different machines seem to get the issue.  (Oh, the vCenter Server seems to constantly error.)

Has anyone actually heard from VMWare or EMC on this issue?

0 Kudos
jkoebrunner
Enthusiast
Enthusiast

Same problem here with VDP 6.0 on VSAN 6.0: Timeout during snapshot creation for some of our VMs (same log errors)

Another problem is the creation of snapshots leads to a short network connection timeout of the VMs, especially vCenter.

I was hoping that with VSAN 6.0, the snapshot performance would be much better... Smiley Sad

Any solution yet?

Johannes Köbrunner IT Solutions Architect Virtualization, Network and Storage Systems Frequentis AG VTSP, VCP, VCAP-DCD
0 Kudos
RyanJMN
Enthusiast
Enthusiast

How much read and write latency do you see on your storage during backups?  Possibly disabling quiescing could help if crash consistent backup is sufficient.  VDP requires a decent amount of disk performance.  If its sharing storage with the VMs being backed up you could try moving VDP to its own storage or getting a data domain.

VMware KB: Backing up vCenter Server with vSphere Data Protection (VDP) fails with error: Soap fault...

0 Kudos