VMware Cloud Community
darrenoid
Enthusiast
Enthusiast
Jump to solution

Requests Failing After 7.0.1 Upgrade with Event Broker Timeout

Hello,

Ever since we upgraded from vRA 7.0 to 7.0.1 our requests are failing. VMWare support has spent a few weeks working with us without any progress. I thought I would post here in case anyone has any ideas on things to check or has seen this before.

I think we have isolated this problem to a communication issue between the Event Broker Service and the Manager Service. As per this post, Event Broker Service Timeout, we have confirmed that if we turn off the event broker service our requests go through. Also per that article, we have verified that the Iaas components all have the proper certificates and chain (we are using internally signed Microsoft CA certificates). We use event broker subscriptions, so it is not an option to keep the service disabled as a workaround.

To summarize the problem, we fire off a catalog request and it just stays stuck in requested state for hours. We receive the error "Timed out waiting for event broker response". Eventually it fails. To expedite the failures we have increased the event broker service timeout to 5 minutes instead of the default 30 minutes. Here is what we are seeing in the logs.

From the VAMI logs:

EBSFailure.jpg

From the catalina.out logs on the appliance:

2016-06-08 12:54:21,527 vcac: [component="cafe:advanced-designer" priority="ERROR" thread="serviceSubscribeAmqpTaskExecutor-38542" tenant="" context="8gGhr7z8" token="LPsW1gb7"] com.vmware

.vcac.core.service.event.AmqpServerSubscribeService.handleMessage:254 - Error when handle messasge for subscription 'com.vmware.csp.iaas.blueprint.service.machine.lifecycle.provision-b-vRO

_proxy' with headers '{ebs.targetId=83662e2b-a3d5-4f03-8b1c-d17e3186b6d3, content-length=1985, ebs.eventId=cae1e7d0-2db2-11e6-d735-4feb01c67056, ebs.tenantId=DEPT01, trace-id=8gGhr7z8, ebs.

topic.blockable=true, ebs.correlationId=40440602-bf9b-4e05-9d4d-987cb7081057, ebs.eventTopicId=com.vmware.csp.iaas.blueprint.service.machine.lifecycle.provision, id=7cfa40aa-9efe-7169-613c

-718ee55819af, ebs.targetType=machine, ebs.eventType=event, timestamp=1465415661464}'. Cause '403 Forbidden'

2016-06-08 12:54:21,986 vcac: [component="cafe:advanced-designer" priority="ERROR" thread="queue-pool-executer-1" tenant="" context="8gGhr7z8" token="Uxgp3pDO"] com.vmware.vcac.platform.se

rvice.integration.ErrorRequestListenerActivator.onErrorMessageRequest:43 - Failed message with id [7cfa40aa-9efe-7169-613c-718ee55819af] accepted for error processing.

Error Message: [403 Forbidden].

Message: [GenericMessage [payload=byte[1985], headers={content-length=1985, ebs.eventId=cae1e7d0-2db2-11e6-d735-4feb01c67056, ebs.correlationId=40440602-bf9b-4e05-9d4d-987cb7081057, ebs.ta

rgetType=machine, ebs.targetId=83662e2b-a3d5-4f03-8b1c-d17e3186b6d3, ebs.tenantId=DEPT01, trace-id=8gGhr7z8, ebs.topic.blockable=true, ebs.eventTopicId=com.vmware.csp.iaas.blueprint.service

.machine.lifecycle.provision, id=7cfa40aa-9efe-7169-613c-718ee55819af, ebs.eventType=event, timestamp=1465415661464}]]

org.springframework.web.client.HttpClientErrorException: 403 Forbidden

        at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

        at com.vmware.vcac.platform.rest.client.error.ResponseErrorHandler.handleError(ResponseErrorHandler.java:61) ~[platform-rest-client-7.0.1-SNAPSHOT.jar:?]

        at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:641) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

        at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:597) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

        at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:557) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

        at org.springframework.web.client.RestTemplate.put(RestTemplate.java:409) ~[spring-web-4.2.4.RELEASE.jar:4.2.4.RELEASE]

        at com.vmware.vcac.platform.rest.client.impl.RestClientImpl.put(RestClientImpl.java:445) ~[platform-rest-client-7.0.1-SNAPSHOT.jar:?]

        at com.vmware.vcac.platform.rest.client.services.AbstractService.put(AbstractService.java:157) ~[platform-rest-client-7.0.1-SNAPSHOT.jar:?]

        at com.vmware.vcac.core.event.broker.rest.client.service.EventService.publishReplyEvent(EventService.java:53) ~[event-broker-client-rest-service-7.0.1-SNAPSHOT.jar:?]

        at com.vmware.vcac.workflow.event.service.impl.ReplyEventSenderImpl.publishReplyEvent(ReplyEventSenderImpl.java:110) ~[classes/:?]

        at com.vmware.vcac.workflow.event.service.impl.ReplyEventSenderImpl.sendUnblockEventPropagationSignal(ReplyEventSenderImpl.java:46) ~[classes/:?]

        at com.vmware.vcac.workflow.event.service.impl.SubscribeWorkflowServiceImpl$WorkflowSubscriptionEventListener.onEvent(SubscribeWorkflowServiceImpl.java:248) ~[classes/:?]

        at com.vmware.vcac.workflow.event.service.impl.SubscribeWorkflowServiceImpl$WorkflowSubscriptionEventListener.onEvent(SubscribeWorkflowServiceImpl.java:190) ~[classes/:?]

        at com.vmware.vcac.core.service.event.AmqpServerSubscribeService$AmqpServerSubscribeMessageHandler.handleMessage(AmqpServerSubscribeService.java:251) ~[service-registry-config-7.0.

1-SNAPSHOT.jar:?]

        at org.springframework.integration.endpoint.PollingConsumer.handleMessage(PollingConsumer.java:103) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:251) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at org.springframework.integration.endpoint.AbstractPollingEndpoint.access$000(AbstractPollingEndpoint.java:57) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at org.springframework.integration.endpoint.AbstractPollingEndpoint$1.call(AbstractPollingEndpoint.java:176) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at org.springframework.integration.endpoint.AbstractPollingEndpoint$1.call(AbstractPollingEndpoint.java:173) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at org.springframework.integration.endpoint.AbstractPollingEndpoint$Poller$1.run(AbstractPollingEndpoint.java:330) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at org.springframework.integration.util.ErrorHandlingTaskExecutor$1.run(ErrorHandlingTaskExecutor.java:55) [spring-integration-core-4.2.0.RELEASE.jar:?]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_72]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_72]

        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_72]

At this point we are all out of ideas...

Regards,

Darrenoid

Reply
0 Kudos
1 Solution

Accepted Solutions
darrenoid
Enthusiast
Enthusiast
Jump to solution

Sorry for the late reply on this thread.

I ended up redeploying a whole new 7.0.1 environment from scratch to get past this issue because VMWare support was taking a very long time. I did keep the old environment up out of curiosity and worked with vmware support on the issue. Eventually their engineers tracked it down. Unfortunately I lost my notes for the issue.

Basically what happened was the "solution user" changed at some point in time. When the first event broker subscription is created in vra, an entry is added to the postgres db in one of the ebs_ tables. As far as I can understand the entry says, vra is now using event broker subscriptions and will use a particular solution user. From then on all subscriptions are run with the solution user listed in that table entry. When the solution user changes, the table is not updated. To resolve this issue, we deleted all event subscriptions from vRA, and then manually deleted that row from the database. Then we re-added our subscriptions and everything was working again.

View solution in original post

Reply
0 Kudos
6 Replies
ShanVMLand
Expert
Expert
Jump to solution

Hi,

I am having the same issue. Did you fix it? Can you please share your experience?

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Reply
0 Kudos
alexc673
Contributor
Contributor
Jump to solution

Hi ,ShanVMLand

i've the same issue that you had..............

How can you solved the problem?

Could you please let me know

Regards

Alex

Reply
0 Kudos
alexc673
Contributor
Contributor
Jump to solution

Hi  darrenoid

I have the same issue that you had

Could you please tell me how did you solve it?

Regards

Alex

Reply
0 Kudos
ShanVMLand
Expert
Expert
Jump to solution

Hi Alex,

I fixed the issue. My environment was vRA 7.1 with F5 loadbalancer (distributed install). We modified the Manager service config file and added the following XML entry:

<add key="Extensibility.Client.RetrievalMethod" value="Polling"/>

The following helped me to fix the issue:

http://pubs.vmware.com/vrealize-automation-71/index.jsp?topic=%2Fcom.vmware.vrealize.automation.doc%...vRealize Automation 7.1 Information Center

Change the Polling Method for Certificates

If you use commas in the OU section of the IaaS certificate, you may encounter STOMP WebSocket errors in the Manager Service log files and virtual machine provisioning may fail. You can remove the commas or change the polling method from WebSocket to HTTP to resolve these issues.

See Manager Service for more information about the Manager Service.

If you found this information useful, please consider awarding points for "Correct" or "Helpful". Thanks!!!
Reply
0 Kudos
alexc673
Contributor
Contributor
Jump to solution

Hi ShamVMLand,

thank you very much for your information

Have a good Day

Cheers

Alex

Reply
0 Kudos
darrenoid
Enthusiast
Enthusiast
Jump to solution

Sorry for the late reply on this thread.

I ended up redeploying a whole new 7.0.1 environment from scratch to get past this issue because VMWare support was taking a very long time. I did keep the old environment up out of curiosity and worked with vmware support on the issue. Eventually their engineers tracked it down. Unfortunately I lost my notes for the issue.

Basically what happened was the "solution user" changed at some point in time. When the first event broker subscription is created in vra, an entry is added to the postgres db in one of the ebs_ tables. As far as I can understand the entry says, vra is now using event broker subscriptions and will use a particular solution user. From then on all subscriptions are run with the solution user listed in that table entry. When the solution user changes, the table is not updated. To resolve this issue, we deleted all event subscriptions from vRA, and then manually deleted that row from the database. Then we re-added our subscriptions and everything was working again.

Reply
0 Kudos