omatsei1
Contributor
Contributor

Can't Add the vSphere Agent?

I'm installing vRA for the first time, and running into some trouble adding the vCenter endpoint.

During the install, I added a vSphere agent to all 3 of my IaaS servers, and gave it the name "vhol" (for virtual hands-on labs). Now, I'm trying to add the endpoint, and even though I call it "vhol", it says "Test connection failed: The vSphere agent does not exist or may not be running.". I've rebooted both the IaaS servers and the vRA appliances, no change. I also verified that there is a service on all three called "VMware vCloud Automation Center Agent - vhol" and that it's started, but still nothing. I also tried adding the endpoint using the exact same credentials, and different credentials, and naming it "vCenter", even though that wasn't the name I chose during the install, but still nothing. I even tried installing a brand new agent on all 3 IaaS servers, called something completely different, but it still says the vSphere agent does not exist or may not be running. Does anyone have any suggestions?

9 Replies
npadmani
Virtuoso
Virtuoso

During the install, I added a vSphere agent to all 3 of my IaaS servers, and gave it the name "vhol"

if this is what you have done, what are these all 3 IaaS Servers. did you do a distributed install of IaaS components? please add little more information for better understanding.

which version of vRA did you deploy?

copy paste the exact error message that you see in Infrastructure->monitoring->logs or take a screenshot of it and post it here.

also take vsphereagent.log file from all three agents and also upload manager service logs for little more understanding of what might be troubling here. (Search on google with string "log location of vRA")

I also tried adding the endpoint using the exact same credentials, and different credentials, and naming it "vCenter", even though that wasn't the name I chose during the install, but still nothing.

when you install agent "vhol" with end point name also "vhol", please create same name vSphere End point in vRA configuration.

try disabling windows firewall as well.

Narendra Padmani VCIX6-DCV | VCIX7-CMA | VCI | TOGAF 9 Certified
0 Kudos
omatsei1
Contributor
Contributor

I'm running the newest version, 7.3. All 3 IaaS servers have all the roles installed, with the first being the active manager service node.

There are several errors or warnings that appear in the Infrastructure - Monitoring - Logs section. I'm not entirely sure which ones are even relevant, since it's a brand new install.

The one that appears as soon as I try to test the endpoint is:

This exception was caught:

The attached endpoint 'vhol.fqdn.domain.com' cannot be found.

There's another error at the same time that says:

Ping Failure :

The HTTP request is unauthorized with client authentication scheme 'Anonymous'. The authentication header received from the server was 'Negotiate,NTLM'.

Inner Exception: The remote server returned an error: (401) Unauthorized.

I can ping all the servers from everything too... I'm not sure why it says ping failure. I'd rather not disable the firewall completely, for a variety of reasons, but I have added exceptions for all the ports in the install guide. I also see a different error that seems to appear once for each IaaS server, every 10 seconds. It says:

Could not connect to the Manager Service. The service might be offline or restarting. If this error persists, verify the that ManagerService.exe is running and that the endpoint, https://<ip address>/VMPS2Proxy, can be reached System.ServiceModel.Security.MessageSecurityException: The HTTP request is unauthorized with client authentication scheme 'Anonymous'. The authentication header received from the server was 'Negotiate,NTLM'. ---> System.Net.WebException: The remote server returned an error: (401) Unauthorized.

at System.Net.HttpWebRequest.GetResponse()

at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)

--- End of inner exception stack trace ---

Server stack trace:

at System.ServiceModel.Channels.HttpChannelUtilities.ValidateAuthentication(HttpWebRequest request, HttpWebResponse response, WebException responseException, HttpChannelFactory`1 factory)

at System.ServiceModel.Channels.HttpChannelUtilities.ValidateRequestReplyResponse(HttpWebRequest request, HttpWebResponse response, HttpChannelFactory`1 factory, WebException responseException, ChannelBinding channelBinding)

at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)

at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)

at System.ServiceModel.Channels.SecurityChannelFactory`1.SecurityRequestChannel.Request(Message message, TimeSpan timeout)

at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout)

at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)

at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)

at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)

Exception rethrown at [0]:

at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)

at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)

at DynamicOps.Vrm.Agent.Core.ProxyAgentService.VMPSProxyAgent.GetWorkItem(String agentID)

at DynamicOps.Vrm.Agent.Core.Communication.VRMCommunication.GetWorkitem(String AgentName).

I don't know if that's relevant, but the instance name is vhol-iaas1, vhol-iaas2, and vhol-iaas3.

0 Kudos
daphnissov
Immortal
Immortal

All 3 IaaS servers have all the roles installed, with the first being the active manager service node.

Why did you settle on this architecture? Having three IaaS servers is fine, but you don't break out copies of all the services on all the nodes. Are you also using load balancers? Please describe your full vRA architecture. What version of vCenter are you using for your vSphere endpoint?

The one that appears as soon as I try to test the endpoint is:

This exception was caught:

The attached endpoint 'vhol.fqdn.domain.com' cannot be found.

You must have an endpoint name of vhol.fqdn.domain.com in your agent configuration or you'll see this message. The names must match from what you specified in the installation wizard and what you create as a vSphere agent in vRA.

Ping Failure :

The HTTP request is unauthorized with client authentication scheme 'Anonymous'. The authentication header received from the server was 'Negotiate,NTLM'.

Inner Exception: The remote server returned an error: (401) Unauthorized.

This doesn't literally mean PING but a hello-type check. If it's returning 401 then there are auth issues. Did you use the installer wizard?

Could not connect to the Manager Service. The service might be offline or restarting. If this error persists, verify the that ManagerService.exe is running and that the endpoint, https://<ip address>/VMPS2Proxy, can be reached System.ServiceModel.Security.MessageSecurityException: The HTTP request is unauthorized with client authentication scheme 'Anonymous'. The authentication header received from the server was 'Negotiate,NTLM'.

Once again, 401 messages, so authentication is not happening properly.

I'd recommend you go through the installation process again but use a more common vRA architecture and then take screenshots of each installer phase.

0 Kudos
omatsei1
Contributor
Contributor

Why did you settle on this architecture? Having three IaaS servers is fine, but you don't break out copies of all the services on all the nodes. Are you also using load balancers? Please describe your full vRA architecture. What version of vCenter are you using for your vSphere endpoint?

I'll answer in reverse order. I'm using vCenter 6.5 U1 (build 5973321).

I have a load balancer in front, with 2 vRA appliances, with another load balancer and 3 IaaS servers. IaaS1 is the primary manager service, and the others are the backups. All 3 have all of the roles installed on them for redundancy. I've never used or installed vRA before, but from my reading, it's relatively easy to add more servers if necessary, so I decided on this architecture based on the presumed relative ease of installation, while including all the elements necessary for adding more servers in the future if necessary. This is currently a proof-of-concept setup, although if we decide to move forward, we need it to be extensible enough to work for whatever we throw at it later.

You must have an endpoint name of vhol.fqdn.domain.com in your agent configuration or you'll see this message. The names must match from what you specified in the installation wizard and what you create as a vSphere agent in vRA.

So... I know this. I never created an agent called "vhol.fqdn.domain.com". I don't know where the hell that came from. The only time I've EVER typed that in, is the endpoint configuration itself, where it asks for the host URL. The name of the agent is "vhol". The name of the endpoint I'm trying to create is "vhol". In the course of my testing, I have ALSO tried installing an agent called "vCenter", since apparently that's the default name for a vSphere agent, but when I try to add an endpoint called "vCenter", it gave me the same error.

This doesn't literally mean PING but a hello-type check. If it's returning 401 then there are auth issues. Did you use the installer wizard?

Yes, I did. Anonymous auth is disabled, but Windows auth is enabled in IIS, just like both the installer and the installation guide say it should be.

I'd recommend you go through the installation process again but use a more common vRA architecture and then take screenshots of each installer phase.

I was unable to find any documentation that has a "common" vRA architecture. The minimal is too minimalistic and can't be expanded, as per the guide. The "distributed" architecture has no redundancy for the vRA appliance itself. The "large distributed and load balanced" architecture is far too large for what we need, and will ever need.

According to the install guide, the "distributed" architecture has all the roles installed on a single IaaS server, so logically, there can't be any conflict between the roles being installed on a single server. The "large distributed" architecture has the roles spread out so much that for 3 servers for each role, you'd need a minimum of 15 servers in addition to the vRA appliances themselves. If that's the kind of architecture you'd recommend, then we won't move forward with the proof-of-concept at all since it'd be FAR too expansive to maintain. Can you recommend an architecture somewhere between the two?

0 Kudos
npadmani
Virtuoso
Virtuoso

omatsei1 you might want to refer following documents.

https://docs.vmware.com/en/vRealize-Automation/7.3/vrealize-automation-73-reference-architecture.pdf

https://docs.vmware.com/en/vRealize-Automation/7.3/vrealize-automation-73-installation-and-configura...

https://docs.vmware.com/en/vRealize-Automation/7.3/vrealize-automation-73-configuration.pdf

by looking into the entire discussion, I feel what you are doing here is more of a self study which is fine as long as you have enough time, but if you are planning to use this in production environment then begin by engaging some consultants (VMware PSO perhaps), and also for a better understanding of vRA, if possible, try to attend vRA 7.x ICM course. This will clear many doubts.

Narendra Padmani VCIX6-DCV | VCIX7-CMA | VCI | TOGAF 9 Certified
0 Kudos
omatsei1
Contributor
Contributor

Yeah... I feel like this whole thing has gotten derailed. I have read all those documents, multiple times in fact. I looked through all of them, and a couple more, looking for the answers to these questions but haven't found any. I've also sent these, and other, questions to our VMware account team, but haven't heard anything back yet. So far, my assessment of vRA is that it's simultaneously too documented, and not documented well enough, like most Cisco products to be honest. 

So here are my questions:

1. Do any of the roles (active or passive Web / Manager Service, DEM, or various Agents) conflict with each in a way that requires them to be separate VM's?

2. If those roles do not conflict with each other, is there any reason NOT to host all 3 of those things on a single VM?

3. If those roles do not conflict with each other, and there's no specific reason NOT to host all 3 on a single VM, then what's the problem with my architecture doing so?

0 Kudos
daphnissov
Immortal
Immortal

If you're looking for an architecture which is good for a PoC but will allow you to take that into production, then you may want to go with an HA-ready architecture. In this architecture, you'll have a single vRA appliance fronted by a load balancer which can either be a real LB or use DNS records to simulate the VIP. The latter is usually what I see in these types of scenarios. You'd then have 3 IaaS boxes where the roles are all broken out but not redundant (one for web; one for manager/DEMO; one for DEMW and Agent). There again, LBs or DNS front the web and manager boxes. This'll allow you to have a large enough PoC environment with sufficient resources for some large deployments but also allow for scaling that out to make it production ready.

As far as your vSphere endpoint goes, check the name of your Windows service as it'll confirm.

But as I said, it would be far more helpful to know the parameters you used when setting this up when it comes to troubleshooting. Hopefully you used the wizard for everything including remediating the IaaS servers for their dependencies, but if you didn't then your 401 issues could be there somewhere. It'd also be helpful to have a larger snippet of your Infrastructure log as well as details on your IaaS servers, your domain, networking layout, etc. vRA is still a complicated beast and while the wizard that was introduced in 7.0 makes the installation process worlds easier from previous days, it still allows for enough latitude to get screwed up CMPs at the end of the day sometimes.

0 Kudos
omatsei1
Contributor
Contributor

Hopefully you used the wizard for everything including remediating the IaaS servers for their dependencies, but if you didn't then your 401 issues could be there somewhere.

I didn't use the wizard to remediate everything, but everything it found, I remediated it manually. I generally don't trust wizards to do stuff like that, but if the recommendation is to do so, I'll try that on the next build. One of the main reasons I'm so hesitant to rebuild everything is the giant pain that it was to manually remediate everything, so that'll make it easier. I'll try to start the install again, going exactly by the "large distributed" architecture. To be clear, that means 3 load balancer IP's (one for vRA, one for IaaS Web, and one for IaaS Manager Service), 2-3 vRA appliances, and 8 Windows servers (2 each for Web, Manager Service, DEM, and Agents), right?

0 Kudos
daphnissov
Immortal
Immortal

I didn't use the wizard to remediate everything, but everything it found, I remediated it manually. I generally don't trust wizards to do stuff like that, but if the recommendation is to do so, I'll try that on the next build.

In this case, the wizard does a good job. I'd recommend you use the wizard and let it remediate your IaaS boxes. The reason is that there are so many components that have to be installed and configured properly, and when people do that manually they almost always leave out pieces and parts which lead to the cryptic errors like the ones you're seeing. Deploy a series of vanilla Windows 2012 R2 boxes, put the management agent on them, and let the wizard handle the rest.

To be clear, that means 3 load balancer IP's (one for vRA, one for IaaS Web, and one for IaaS Manager Service), 2-3 vRA appliances, and 8 Windows servers (2 each for Web, Manager Service, DEM, and Agents), right?

What I suggested in the HA-ready design is a fully-distributed but not a redundant deployment. So yes on the LBs, but the nodes are:

1 x Café appliance

1 x IaaS web

1 x IaaS Manager

1 x IaaS DEM/Agent

If you want to go whole hog right out of the gate, then yes, you can just do the full enterprise deployment like the reference architecture document specifies. I will tell you in the countless dozens of vRA implementations I've done, that especially for PoCs or pilots, the minimal install is good enough and then come back around once you've gotten familiar with the workings of vRA and do the enterprise deployment that you mentioned. Choice is yours, but if you want a quick and easy way to get up and running with sufficient scale to prove out the solution, the minimal will be good enough.