Solved: Create virtualmachine snapshot - 100% for 40mins

AHuntington · ‎04-14-2017

Hey Guys,

Got a really odd one here, for the last week I noticed all VM backups (Veeam 9.5) blew out by 35-45mins each. The problem is with 70+ VMs this pushed us outside the backup Window. In terms of troubleshooting so far we haven't got far, I've reviewed the vCenter logs and rebooted EVERYTHING. What we found from this is that when we reboot vCenter we have no issues for about 15 snapshots. The second thing we tried was snapshots from vCenter (rather than Veeam) but we have the same issue and it's only in creation, not removal. No matter how large or small the VM they all take the same time (even a 5GB CentOS Server) roughly 35-45mins. All tha past tasks looking at the vCeneter logs were between 10-130seconds for the

From what i can see in the vCenter logs there is a timeout peroid that is hit causing the task to be canceled/stopped but this is not causing a failure or any issues, everything still works fine. Its just takes 40mins longer per VM for any snapshots.

Here are the the vCenter logs:

2017-04-14T23:07:39.287+10:00 error vpxd[05084] [Originator@6876 sub=MoEvent opID=4569d102] [EventManagerMoImpl::EventBatchAppender::EventAppenderJob::PostJob] VirtualCenter did not respond to the request to append events after 2400 seconds

2017-04-14T23:07:39.291+10:00 error vpxd[05084] [Originator@6876 sub=Vmomi opID=4569d102] [VpxVmomi] IO exception while sending fault

2017-04-14T23:07:39.291+10:00 info vpxd[05084] [Originator@6876 sub=vpxLro opID=4569d102] [VpxLRO] -- FINISH task-internal-27952

2017-04-14T23:07:39.291+10:00 info vpxd[05084] [Originator@6876 sub=Default opID=4569d102] [VpxLRO] -- ERROR task-internal-27952 -- EventManager -- vim.event.EventManager.logUserEvent: vmodl.fault.UnexpectedFault:

--> Result:

--> (vmodl.fault.UnexpectedFault) {

--> faultCause = (vmodl.MethodFault) null,

--> faultName = "vim.fault.Timedout",

--> fault = (vmodl.MethodFault) null,

--> msg = ""

--> }

--> Args:

-->

--> Arg entity:

--> 'vim.VirtualMachine:vm-2741'

--> Arg msg:

--> "Source: Veeam Backup

--> Action: Job "Backup"

--> Operation: Stopped

--> Status: Success

--> ID: 9444fd99-6cb7-45c0-90b7-cead7533a244

--> Description: "

2017-04-14T23:07:39.297+10:00 warning vpxd[05316] [Originator@6876 sub=VpxProfiler opID=HB-host-19@28497-1d592c26] VpxdInvtVm::ProcessVmInfo [ProcessVmInfoTime] took 1477703 ms

2017-04-14T23:07:39.297+10:00 warning vpxd[05316] [Originator@6876 sub=VpxProfiler opID=HB-host-19@28497-1d592c26] VpxdInvtVm::ProcessVmChanges [ProcessVmChangesTime] took 1477703 ms

2017-04-14T23:07:39.300+10:00 warning vpxd[05316] [Originator@6876 sub=VpxProfiler opID=HB-host-19@28497-1d592c26] [VpxdHostSync] ProcessChanges host:syd-sy1-ex02.cloud.local [ProcessChanges] took 1477706 ms

2017-04-14T23:07:39.301+10:00 warning vpxd[05316] [Originator@6876 sub=VpxProfiler opID=HB-host-19@28497-1d592c26] [VpxdHostSync] DoHostSync:0000000004A81420 [DoHostSyncTime] took 1478249 ms

2017-04-14T23:07:39.301+10:00 warning vpxd[05316] [Originator@6876 sub=VpxProfiler opID=HB-host-19@28497-1d592c26] InvtHostSyncLRO::StartWork [HostSyncTime] took 1478249 ms

2017-04-14T23:07:39.301+10:00 warning vpxd[05316] [Originator@6876 sub=VpxProfiler opID=HB-host-19@28497-1d592c26] VpxLro::LroMain [TotalTime] took 1478249 ms

Going bonkers here guys, not idea whats going on. Thanks in advance for any help

I've created a project on Freelancer also:vmware vcenter | VMware

Cheer.

AHuntington · ‎07-08-2017

UPDATE: Fixed but not solved.

Hi Everyone,

So I ended up taking a bit of a sledgehammer approach to the issue as time to troubleshoot was running thin.

Actions:

- Re-indexed vPostgreSQL

- Backup VCDB in PostgreSQL with python (https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=20919... )

- Restored vCenter from two years ago (before any trace of DB issues)

- Disabled vCenter service

- Reset VM password (https://msdn.microsoft.com/en-us/powershell/reference/5.1/microsoft.powershell.management/reset-comp... )

- Patched and updated

- Restored VCDB (as per link above)

- Started vCenter

It's been a few days and seems stable so I'm putting in the done pile but will update if it pops back up.

View solution in original post

Sakkanan · ‎04-14-2017

delay may be due to issue with your storage. Check your storage controller and logs for any hardware issue on the storage.

AHuntington · ‎04-14-2017

No, the issue is apparent across multiple storage arrays and DAS. Also it's temporary resolved following the vCenter reboot and has only been apparent in the last few weeks after 24 months without issue. If it was a storage issue i'd expect to see more symptoms like other VMs on that array with poor performance stats, pauses or failures.

There is no adverse effect aside from the elongated snapshot task aside from pushing out the backup windows. Seeing as snapshot creation is fairly light on IO, or any resources it seems like something in vCenter is hanging pr missing the bus.

Sakkanan · ‎04-14-2017

Is there any changes done on the environment?

version of Vsphere and ESXI host Hardware model?

AHuntington · ‎04-14-2017

No changes.

ESXi 6.0.0, 3825889

vCenter 6.0 Update 2

Flex x240 M5

EMC XtreamIO

Dell Compellent SC9000

All firmware is current.

AHuntington · ‎04-24-2017

UPDATE: The issue is temporarily resolved 12-24Hrs by restarting vCenter (whole Windows VM) but then the problem returns.

inkjet2000 · ‎04-25-2017

I'm suffering through the exact same problem that has been plaguing me since rolling out Veeam 9.5 Update 1. We're also on vCenter 6 Update 2, and this was also happening while running update 1. I'm seeing very similar error messages as you.

2017-04-17T08:15:41.545-04:00 info vpxd[36060] [Originator@6876 sub=vpxLro opID=668fcc94] [VpxLRO] -- FINISH task-internal-1128758

2017-04-17T08:15:41.545-04:00 info vpxd[36060] [Originator@6876 sub=Default opID=668fcc94] [VpxLRO] -- ERROR task-internal-1128758 -- EventManager -- vim.event.EventManager.logUserEvent: vmodl.fault.UnexpectedFault:

--> Result:

--> (vmodl.fault.UnexpectedFault) {

--> faultCause = (vmodl.MethodFault) null,

--> faultName = "vim.fault.Timedout",

--> fault = (vmodl.MethodFault) null,

--> msg = ""

--> }

--> Args:

-->

--> Arg entity:

--> 'vim.VirtualMachine:vm-2015'

--> Arg msg:

--> "Source: Veeam Backup

--> Action: Job "REDACTED"

--> Operation: Stopped

--> Status: Failed

--> ID: 5f4d48d2-937a-4f0c-b020-d00977c8212f

--> Description: "

2017-04-17T08:15:41.694-04:00 info vpxd[30920] [Originator@6876 sub=vpxLro opID=opId-2def0694-2ff3-4489-a88c-ef3dc2d2f599-62-a4] [VpxLRO] -- BEGIN task-internal-1141421 -- ServiceInstance -- vim.ServiceInstance.GetServerClock -- 520eebdd-3dbe-31ea-dbb9-99f45108c628(528fa8c7-e19b-34ce-9f7f-5add55cf5e2f)

2017-04-17T08:15:41.696-04:00 info vpxd[30920] [Originator@6876 sub=vpxLro opID=opId-2def0694-2ff3-4489-a88c-ef3dc2d2f599-62-a4] [VpxLRO] -- FINISH task-internal-1141421

2017-04-17T08:15:41.702-04:00 info vpxd[40616] [Originator@6876 sub=vpxLro opID=opId-0b1361a7-b76c-4aaa-9c21-30b4058aec6b-57-16] [VpxLRO] -- BEGIN task-internal-1141422 -- ServiceInstance -- vim.ServiceInstance.GetServerClock -- 52a39e5c-8380-7745-cab2-0ec8bfdd8ccf(52fb4d07-2a3d-94b9-84f2-aab1aba4c5f0)

2017-04-17T08:15:41.704-04:00 info vpxd[40616] [Originator@6876 sub=vpxLro opID=opId-0b1361a7-b76c-4aaa-9c21-30b4058aec6b-57-16] [VpxLRO] -- FINISH task-internal-1141422

2017-04-17T08:15:41.722-04:00 info vpxd[41648] [Originator@6876 sub=vpxLro opID=opId-82e7e45d-a6c4-49df-90bd-fd98a430f8b1-68-79] [VpxLRO] -- BEGIN task-internal-1141423 -- ServiceInstance -- vim.ServiceInstance.GetServerClock -- 527f1498-f872-9c49-6ff0-34f156d67cc9(52eb3d4b-f9dc-5810-67f8-10551a25df17)

2017-04-17T08:15:41.722-04:00 info vpxd[41648] [Originator@6876 sub=vpxLro opID=opId-82e7e45d-a6c4-49df-90bd-fd98a430f8b1-68-79] [VpxLRO] -- FINISH task-internal-1141423

2017-04-17T08:15:52.176-04:00 error vpxd[40044] [Originator@6876 sub=MoEvent opID=4793cbf4] [EventManagerMoImpl::EventBatchAppender::EventAppenderJob::PostJob] VirtualCenter did not respond to the request to append events after 2400 seconds

2017-04-17T08:15:52.193-04:00 error vpxd[40044] [Originator@6876 sub=Vmomi opID=4793cbf4] [VpxVmomi] IO exception while sending fault

2017-04-17T08:15:52.193-04:00 info vpxd[40044] [Originator@6876 sub=vpxLro opID=4793cbf4] [VpxLRO] -- FINISH task-internal-1128855

2017-04-17T08:15:52.193-04:00 info vpxd[40044] [Originator@6876 sub=Default opID=4793cbf4] [VpxLRO] -- ERROR task-internal-1128855 -- EventManager -- vim.event.EventManager.logUserEvent: vmodl.fault.UnexpectedFault:

--> Result:

--> (vmodl.fault.UnexpectedFault) {

--> faultCause = (vmodl.MethodFault) null,

--> faultName = "vim.fault.Timedout",

--> fault = (vmodl.MethodFault) null,

--> msg = ""

--> }

--> Args:

-->

--> Arg entity:

--> 'vim.VirtualMachine:vm-1620'

--> Arg msg:

--> "Source: Veeam Backup

--> Action: Job "REDACTED"

--> Operation: Stopped

--> Status: Success

--> ID: f03e5766-4385-4c11-92f3-a1f766849829

--> Description: "

The only errors I see before this are regarding the Web server.

2017-04-17T07:51:25.553-04:00 error vpxd[37000] [Originator@6876 sub=HttpSvc.HTTPService] Failed to read request; stream: <io_obj p:0x000000002e11caa8, h:-1, <pipe '\\.\pipe\vmware-vpxd-webserver-pipe'>, <pipe '\\.\pipe\vmware-vpxd-webserver-pipe'>>, error: class Vmacore::TimeoutException(Operation timed out)

2017-04-17T07:51:25.554-04:00 error vpxd[38936] [Originator@6876 sub=HttpSvc.HTTPService] Failed to read request; stream: <io_obj p:0x00000000176be8f8, h:-1, <pipe '\\.\pipe\vmware-vpxd-webserver-pipe'>, <pipe '\\.\pipe\vmware-vpxd-webserver-pipe'>>, error: class Vmacore::TimeoutException(Operation timed out)

The next errors after these messages involve failures with Veeam job information that match your output. Do you happen to also see these HTTP pipe failures in your logs?

AHuntington · ‎04-28-2017

HTTP pipe errors in Veeam or vCenter logs?have the same problem with manual snapshots through vCenter

I don't think it's a Veeam issue as I have the same problem when creating snapshots from vCenter directly.

inkjet2000 · ‎04-28-2017

Those pipe errors are from the VPXD logs. Veeam is showing errors in it's log saying that it's failing to disconnect it's session due to a forbidden message from vCenter. I believe this is causing stale sessions to build which may be causing the vcenterServer service to crash. I'll post more here as we investigate.

RubalSingh · ‎06-14-2017

Hi,

I am facing the same issue with snapshot delays and after VC reboot it got fixed for a while but popped up again. If you found any update , please let me know.

Thanks in advance.

SK21090 · ‎06-15-2017

did you check vmware.log file ?

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! VCAP-DCV | VCP-DCV | VCP-NV | MCSA

RubalSingh · ‎06-16-2017

Yes, i checked and found same type of logs mentioned in above.

SK21090 · ‎06-16-2017

okie,

Did you try to take snapshot individually by login to standalone ESXI? How did it go?

could you also check the resource utilization of vCenter?

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! VCAP-DCV | VCP-DCV | VCP-NV | MCSA

RubalSingh · ‎06-16-2017

Manual snapshot is working fine. But in VPXD.log seeing below timeout errors a lot

2017-06-14T14:45:00.605-05:00 error vpxd[06776] [Originator@6876 sub=SSL SoapAdapter.HTTPService] Failed to read request; stream: <SSL(<io_obj p:0x0000000043a45b68, h:-1, <TCP '0.0.0.0:0'>, <TCP '0.0.0.0:0'>>)>, error: class Vmacore::TimeoutException(Operation timed out)

Failed to accept connection; <acceptor p:0x0000000011ab65e0, h:5380, <TCP '127.0.0.1:8085'>>, e: system:995(The I/O operation has been aborted because of either a thread exit or an application request)

Failed to connect socket; <io_obj p:0x0000000104eb1e38, h:1056, <TCP '0.0.0.0:0'>, <TCP '127.0.0.1:8088'>>, e: system:10061(No connection could be made because the target machine actively refused it)

AHuntington · ‎06-19-2017

Snapshot from the ESXi host directly takes about 43ms

Utilisation on vCenter

- CPU 5% (6 Cores)

- RAM 60% (12GB)

- Disk <10ms latency

I found these errors in vPostgres but not sure if related.

2017-06-19 00:07:01.792 AEST 594688f4.59cc 0 ERROR: left link changed unexpectedly in block 75 of index "vpxi_task_complete_time"

2017-06-19 00:07:01.792 AEST 594688f4.59cc 0 CONTEXT: automatic vacuum of table "VCDB.vc.vpx_task"

AHuntington · ‎06-19-2017

2017-06-20T08:57:38.094+10:00 info vpxd[05472] [Originator@6876 sub=vpxLro opID=6e73ffa9] [VpxLRO] -- FINISH task-internal-1952981

2017-06-20T08:57:38.097+10:00 info vpxd[04964] [Originator@6876 sub=vpxLro opID=18ed54da] [VpxLRO] -- BEGIN task-internal-1952982 -- host-19 -- vim.HostSystem.GetConfig -- 52881301-3cac-4455-93c3-7dbde1b1914a(52b2533d-00e0-ef7c-3cca-2808258c3075)

2017-06-20T08:57:38.181+10:00 info vpxd[04964] [Originator@6876 sub=vpxLro opID=18ed54da] [VpxLRO] -- FINISH task-internal-1952982

2017-06-20T08:57:38.266+10:00 error vpxd[17356] [Originator@6876 sub=SoapAdapter.HTTPService] Failed to read request; stream: <io_obj p:0x000000000b06fa58, h:4936, <TCP '127.0.0.1:8085'>, <TCP '127.0.0.1:59501'>>, error: class Vmacore::SystemException(An existing connection was forcibly closed by the remote host)

2017-06-20T08:57:38.850+10:00 info vpxd[00960] [Originator@6876 sub=vpxLro opID=4597126d] [VpxLRO] -- BEGIN task-internal-1952984 -- session[527c9b0a-5c45-a878-3047-2391bdfb72c5]52e2df17-65d7-b429-ba54-6cb3c9090e68 -- vmodl.query.PropertyCollector.Filter.destroy -- 527c9b0a-5c45-a878-3047-2391bdfb72c5(520ab8a4-fffc-1087-c2e2-c8528b600451)

2017-06-20T08:57:38.850+10:00 info vpxd[00960] [Originator@6876 sub=vpxLro opID=4597126d] [VpxLRO] -- FINISH task-internal-1952984

2017-06-20T08:57:38.911+10:00 info vpxd[07412] [Originator@6876 sub=vpxLro opID=6cb36ffe] [VpxLRO] -- BEGIN task-internal-1952986 -- SearchIndex -- vim.SearchIndex.findByDatastorePath -- 527c9b0a-5c45-a878-3047-2391bdfb72c5(520ab8a4-fffc-1087-c2e2-c8528b600451)

2017-06-20T08:57:38.913+10:00 info vpxd[07412] [Originator@6876 sub=vpxLro opID=6cb36ffe] [VpxLRO] -- FINISH task-internal-1952986

2017-06-20T08:57:40.035+10:00 info vpxd[06332] [Originator@6876 sub=vpxLro opID=394edd24] [VpxLRO] -- BEGIN task-internal-1952988 -- session[527c9b0a-5c45-a878-3047-2391bdfb72c5]523ea4a8-848b-47ed-ccc0-6cdab202b110 -- vmodl.query.PropertyCollector.Filter.destroy -- 527c9b0a-5c45-a878-3047-2391bdfb72c5(520ab8a4-fffc-1087-c2e2-c8528b600451)

2017-06-20T08:57:40.035+10:00 info vpxd[06332] [Originator@6876 sub=vpxLro opID=394edd24] [VpxLRO] -- FINISH task-internal-1952988

2017-06-20T08:57:40.085+10:00 info vpxd[09148] [Originator@6876 sub=vpxLro opID=360ac313] [VpxLRO] -- BEGIN task-internal-1952990 -- SearchIndex -- vim.SearchIndex.findByDatastorePath -- 527c9b0a-5c45-a878-3047-2391bdfb72c5(520ab8a4-fffc-1087-c2e2-c8528b600451)

2017-06-20T08:57:40.087+10:00 info vpxd[09148] [Originator@6876 sub=vpxLro opID=360ac313] [VpxLRO] -- FINISH task-internal-1952990

2017-06-20T08:57:41.334+10:00 error vpxd[05364] [Originator@6876 sub=MoEvent opID=b769632] [EventManagerMoImpl::EventBatchAppender::EventAppenderJob::PostJob] VirtualCenter did not respond to the request to append events after 2400 seconds (AKA 40mins)

2017-06-20T08:57:41.338+10:00 error vpxd[05364] [Originator@6876 sub=Vmomi opID=b769632] [VpxVmomi] IO exception while sending fault

2017-06-20T08:57:41.338+10:00 info vpxd[05364] [Originator@6876 sub=vpxLro opID=b769632] [VpxLRO] -- FINISH task-internal-1947434

2017-06-20T08:57:41.338+10:00 info vpxd[05364] [Originator@6876 sub=Default opID=b769632] [VpxLRO] -- ERROR task-internal-1947434 -- EventManager -- vim.event.EventManager.postEvent: vmodl.fault.UnexpectedFault:

--> Result:

--> (vmodl.fault.UnexpectedFault) {

--> faultCause = (vmodl.MethodFault) null,

--> faultName = "vim.fault.Timedout",

--> fault = (vmodl.MethodFault) null,

--> msg = ""

--> }

--> Args:

-->

--> Arg eventToPost:

--> (vim.event.EventEx) {

--> key = 6201771,

--> chainId = 6201771,

--> createdTime = "2017-06-19T22:17:41.334532Z",

--> userName = "username",

--> datacenter = (vim.event.DatacenterEventArgument) {

--> name = "datacenter",

--> datacenter = 'vim.Datacenter:f52e-512d-4251-a59c-6538f60cd274:datacente'

--> },

--> computeResource = (vim.event.ComputeResourceEventArgument) null,

--> host = (vim.event.HostEventArgument) null,

--> vm = (vim.event.VmEventArgument) null,

--> ds = (vim.event.DatastoreEventArgument) null,

--> net = (vim.event.NetworkEventArgument) null,

--> dvs = (vim.event.DvsEventArgument) null,

--> fullFormattedMessage = <unset>,

--> changeTag = <unset>,

--> eventTypeId = "com.vmware.vshield.edge.gateway.HealthcheckNoPulse",

--> severity = "info",

--> message = <unset>,

--> objectId = "datacenter-2",

--> objectType = "vim.Datacenter",

--> objectName = "datacenter",

--> fault = (vmodl.MethodFault) null,

--> }

--> Arg taskInfo:

-->

2017-06-20T08:57:42.777+10:00 info vpxd[12248] [Originator@6876 sub=vpxLro opID=PHS-4bbf1bbc-ec73-4420-8473-179617980566-1a-de-e7] [VpxLRO] -- BEGIN task-internal-1952993 -- domain-c25 -- vim.ComputeResource.GetHost -- 521c16b8-9819-78c4-2265-88bbd0a043c0(524a732b-ac43-3d0e-1477-2bdfdc7699b4)

2017-06-20T08:57:42.778+10:00 info vpxd[12248] [Originator@6876 sub=vpxLro opID=PHS-4bbf1bbc-ec73-4420-8473-179617980566-1a-de-e7] [VpxLRO] -- FINISH task-internal-1952993

SK21090 · ‎06-20-2017

vPostgress can't be the issue here as u mentioned that issue is the same while directly login to ESXI host and taking snapshot took same time

so the issue can be either the patch level of ESXi, Storage or if the VM's are real too busy to allow snapshot

do you have any local disk on ESXI to test the snapshot location change ?

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! VCAP-DCV | VCP-DCV | VCP-NV | MCSA

AHuntington · ‎06-20-2017

Snapshot:

vCenter = 40mins

ESXi = <40ms

So on ESXi directly it's almost instant.

SK21090 · ‎06-20-2017

ohkie i see now . could you create a dummy VM without configuring in veeam and take snapshot to check how it goes?

Please consider awarding points for "Correct" or "Helpful" replies. Thanks....!!! VCAP-DCV | VCP-DCV | VCP-NV | MCSA

AHuntington · ‎06-20-2017

Just created a new blank VM with a 1GB Thin disk and its the same issue