Re: Failed to execute task: INNER - VIO 3.0

TDRoy · ‎10-18-2016

Here is the message:

"You may use 'retry' if there is no change for your deployment. Task execution failed: Task failed on the following nodes: ['xx.xx.xx.xx]. Refer logs for more details."

Looking at ansible.log:

Failed: xx.xx.xx.xx "elapsed": 300, "failed" : true

msg: Timeout when waiting for xx.xx.xx.xx:9696

Looking at jarvis.log:

ERROR: Task failed on the following nodes: [xx.xx.xx.xx] Refer to logs

So from what I see there is an issue with the IP or the node which uses this IP? Is this a networking issue? Something with deploying the openstack pieces themselves?

Any help would be appreciated

Thank you

lserpietri · ‎10-19-2016

Hi TDRoy!

So it looks like Ansible times out when it's waiting for Neutron (which listens on port 9696) to start on the Controller node (or the Load Balancer in case of a Compact mode deployment).

On the Controller node, check the /var/log/neutron/neutron-server.log: it would have the information on why Neutron failed to start. Usually that's because it's not able to establish connection with the metadata proxy routers when using NSX as a network provider. If that's the case, please ensure that the Metadata proxy routers are all up and running when the deployment gets to that point.

One suggestion would be to check the if the cluster where you'll be deploying the NSX Edges has enough resources (storage, memory, vCPUs) to support the Edge pool that gets deployed along with VIO.

Hope this helps!
Luca

TDRoy · ‎10-19-2016

Hi Luca,

Thank you for the response.

We are not using NSX. We are just trying to see the deploy and environment brought up in Compact Mode with VDS networking.

I will check the neutron logs and report back.

Thank you

TDRoy · ‎10-19-2016

On the ControlPlane machine, using Compact Deployment, in the below location I see this:

/var/log/neutron/neutron-server.log and log.1

ERROR neutron ConnectionError: HTTPSConnectionPool ... Max retires exceeded with url: /sdk/vimService.wsdl ...Failed to establish a new connection: [Errno -2] Name or service not known',))

We are not using NSX, yet. We are deploying in Compact Mode.

ssurana · ‎10-19-2016

Seems like the control plane vm can not reach the vcenter server.

Can you ssh into the nodes, and try reaching the VC?

enekux · ‎10-20-2016

We are also using DVS instead NSX...

looks like the 2 recent cases are showing similar symptoms... I started a discussion "Compact mode installation fails on Post Deployment"

I have connected over SSH to the OMS... but... what do you mean with try to reach VC?

Does the SSH access to the VC need to be open? Currently is closed...

Any ideas?

TDRoy · ‎10-21-2016

This is intesting, I see the following:

ControlPlane and Compute VM are deployed.

The computeVM can reach the vCenter and the ControlPlane

The ControlPlane can reach the ComputeVM but not the vCenter.

I see that traffic (ping/traceroute) is going out via the API network NIC and not the Mgmt NIC by default. If I run a ping through the Mgmt NIC, I still cannot reach the VC

This tell me that networking IS working, since the CP and ComputeVM are on the same mgmt VLAN and one of them can hit VC.

As the user below mentions, what kind of communication needs to be allowed from the CP to the VC? Is it SSH? Some open port?

Any help would be appreciated.

enekux · ‎10-25-2016

Hello guys,

TDRoy, I wonder if you have done any progress?

I am stuck... so no fun with VIO...

TDRoy · ‎10-25-2016

No progress. Still have issues reaching the vCenter from my Control Plane.

Im trying to reinstall with a new vCenter.

TDRoy · ‎10-25-2016

Alright.

I have built a new vCenter exclusively for this environment.

vCenter IP - VLAN aaa

Manager Server IP - VLAN aaa

Management Network - VLAN aaa

API Network - VLAN bbb

The issue previously seemed to be that vCenter could not be reached from the Control Plane. Now every host/node/vCenter can talk to each other and the installation is still failing.

I thought that maybe DNS was not working properly so I had the management server call vCenter using the IP and not the FQDN. This seemed to progress more, but ultimately failed with the same error.

In the logs I now see this:

ansible.log

(all failures on image/glance related TASKS)

TASK: Check if image already exists...

failed: [xx.xx.xx.xx] {"censored": "results hidden due to no_log parameter", "changed"

I will check to see if there is another Glance/Image post-deployment thread in here. Any suggestions are welcome.

TDRoy · ‎10-26-2016

Most recent update:

Although the management server failed to finish the deploy, I was able to get onto the mgmt server and use viocli to start the openstack services. I was then able to get to the Dashboard and upload an image etc.

I see that the networks were not created however and there is still an issue in the logs with glance/image stuff.

Anyinfo would be great. I will be re-deploying the environment shortly.

All

Failed to execute task: INNER - VIO 3.0