Compact mode installation fails on Post Deployment - VMware Technology Network VMTN

VMware Cloud Community

Hello,

I am having problems installing VIO as Compact Mode. At the vSphere Web Client I get the next Error:

Failed to execute task. INNER

You may use 'retry' if there is no change for your deployment. Task execution failed: Task failed on the following nodes: ['IP_ADDRESS']. Refer logs for more details.

Here are the logs:

#sudo viocli deployment configure

...

Configuring post deployment VIO...

post-deployment failed on the following nodes: ['IP_ADDRESS'].

# sudo viocli deployment post-deploy

Configuring post deployment VIO...

post-deployment failed on the following nodes: ['IP_ADDRESS'].

# tail /var/log/jarvis/ansible.log

...

2016-10-12 15:08:01,386 p=328 u=jarvis | TASK: [post-deployment | import image into glance] ****************************

2016-10-12 15:08:11,185 p=328 u=jarvis | failed: [IP_ADDRESS] => {"censored": "results hidden due to no_log parameter", "changed": true, "rc": 1}

2016-10-12 15:08:11,186 p=328 u=jarvis | FATAL: all hosts have already failed -- aborting

vSphere Recent Tasks:

Deploy OVF template

191d76c0-c384-49e9-bcec-f57b247de63b

Operation timed out.

VSPHERE.LOCAL\Administrator

11 Replies

Hi,

I have also noticed that the status of the 2 nodes is: Bootstrap Failed...

Meantime, I have also double-checked the r/w access to the Datastores provisioned for Glance from the Management Cluster and it is looking all OK. So I don't have any specific firewall rules...

I am running out of ideas... is anybody else having any similar issue?

Thank you,

Hi!

Run viocli show -i to see the latest inventory file used to configure VIO Management Plane: are the IP addresses for the loadbalancer and compute node showing up there?

Thanks!
Luca

Thank you,

yes they are:

[compute]

IP_COMPUTE_NODE cluster_name=OpenStack_Compute_Cluster datastore_regex="Silver01" hostname=compute01 cluster_moid=domain-c957

[lb]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

[controller]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

[memcache]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

[storage]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

[db]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

[mq]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

[dhcp]

INTERNAL_IP_LOAD_BALANCER hostname=loadbalancer01

Hi!

can you deploy a template/create a VM on the selected datastores?

Hi,

sorry for the delay... I missed the notification...

Yes, I can create VMs in that datastore... I have also tried using all different datastores... different clusters... but the error always happens when trying to deploy the ubuntu image for glance...

I am really stacked here... in the past in version 2.5 where the requirements for VIO deployment where very high I bypassed it having nested ESXi hosts/clusters and it was OK.

Now in the compact mode it fails... anyone else deploying in Compact Mode? without issues?

Thank you and regards,

Hi again,

thanks to your recent comment in the other discussion "Failed to execute task: INNER - VIO 3.0", I have done the next:

In the VIO management console I re-run "post-deploy" with "viocli":

root@viomgmt:/home/viouser# viocli deployment post-deploy

Configuring post deployment VIO...

post-deployment failed on the following nodes: ['INTERNAL_IP_LOADBALANCER'].

In the Controller node, I checked the "/var/log/glance/glance-api.log" log file (see attached file):

root@loadbalancer01:/home/viouser# tail -f /var/log/glance/glance-api.log

And I get some errors like:

....

2016-10-19 08:46:21.331 5689 ERROR glance.api.v2.image_data [req-c39bd1b9-fb66-4dea-8dde-1b4f2b5e7c8e 446efa263c0d4600ba8aeda9c44126bd 727b925cdb4046c5889a232099c67896 - - -] Failed to upload image data due to internal error

....

2016-10-19 08:46:21.331 5689 ERROR glance.api.v2.image_data NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f362e529590>: Failed to establish a new connection: [Errno 13] EACCES

2016-10-19 08:46:21.331 5689 ERROR glance.api.v2.image_data

2016-10-19 08:46:21.364 5689 DEBUG oslo_messaging._drivers.amqpdriver [req-c39bd1b9-fb66-4dea-8dde-1b4f2b5e7c8e 446efa263c0d4600ba8aeda9c44126bd 727b925cdb4046c5889a232099c67896 - - -] CAST unique_id: 82cbbacb8ca0466dbaf8cbe4baae122e NOTIFY exchange 'glance' topic 'notifications.info' _send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:438

2016-10-19 08:46:21.366 5689 ERROR glance.common.wsgi [req-c39bd1b9-fb66-4dea-8dde-1b4f2b5e7c8e 446efa263c0d4600ba8aeda9c44126bd 727b925cdb4046c5889a232099c67896 - - -] Caught error: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f362e529590>: Failed to establish a new connection: [Errno 13] EACCES

...

2016-10-19 08:46:21.366 5689 ERROR glance.common.wsgi NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f362e529590>: Failed to establish a new connection: [Errno 13] EACCES

2016-10-19 08:46:21.366 5689 ERROR glance.common.wsgi

....

Any ideas?

Thank you,

hello,

i changed from 2.5 to 3.0 compact mode + nsx and i have the same identical problem and log.

There is a solution? i plan an installtion to my client in a few days 😞

Thank you and regards,

uhm, is the OMS certificate in shape? By that I mean, has it been issued to the correct FQDN?

To check, run openssl s_client -connect localhost:9443 -showcerts from the OMS and check if the subject name and the alternate subject name are the OMS's.

Cheers!

I was using VIO2.5 since one week ago... no problems...

As I wrote here VIO 3.0 installation HELP! whe i'm going to import an image (the same problem encountered during installation) operation "deploy OVF file" time out

I'm using default installation certificates

Any help?

Try the follow and let us know the result.

1> ping all the esxi hosts used by compute from loadbalancer01 , make sure you can ping them.

2> telnet exsihost_ip 443 from loadbalancer01 to make sure 443 port on the esxi is open.

It will be nice that you can describe your env topology

Hi. I am facing exactly the same issue with VIO 3.0 and then with 3.1 after upgrade, in compact mode. Really need a fix - in a POC lab environment it's hard to get tons of vCPUS and RAM hence I am using the compact deployment model.

I have adjusted the VIO management node config (/opt/vmware/vio/etc/omjs.properties) to relax the requirements and all deployment works fine up to 98% or so when the deployment script tries to create a VM under OpenStack/Project (f2d4ac6a851d4f4b902eb9eed718a60a)/Images (glance?). I have already wasted days with this bug and can't find a logical explanation or any hints in the logs

The adjusted settings relax requirements on minimum count if controllers, LB, etc as well as on vMotion, DRS, etc. The controller VM and the compute VM are created just fine, can login to them, however the deployment fails with "Bootstrap failed" on both VMs.

In the vSphere client task pane I can see a "Deploy OVF" task failed. I see HTTPS connect errors, not sure what they are but the management server, the controller and the compute VM are all on same management vlan on the VDS and checked connectivity among them just fine.

Name Target Status

-----------------------------------------------------------------------------------------------------------------

Deploy OVF template d0203844-2ee6-4fee-b93f-ecce7144e5c0 0%

after a while...

Deploy OVF template d0203844-2ee6-4fee-b93f-ecce7144e5c0 Operation timed out

Getting the logs with "viocli deployment getlogs" and searching for errors around the time the deployment was aborted I can see the errors in the nova and glance logs

>>>compute01\logs\nova-compute.log (4 hits)

Line 5: 2017-03-10 20:51:01.252 1273 ERROR oslo.messaging._drivers.impl_rabbit [req-94ec5e5c-190e-4315-a092-94dc4a49c48d - - - - -] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.

Line 6: 2017-03-10 20:51:02.259 1273 ERROR oslo.messaging._drivers.impl_rabbit [req-94ec5e5c-190e-4315-a092-94dc4a49c48d - - - - -] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.

Line 180: 2017-03-10 20:51:12.177 1448 DEBUG oslo_service.service [req-fb976bc3-776e-4654-bd4c-2c33f67e4863 - - - - -] logging_exception_prefix = %(asctime)s.%(msecs)03d %(process)d ERROR %(name)s %(instance)s log_opt_values /usr/lib/python2.7/dist-packages/oslo_config/cfg.py:2519

Line 594: 2017-03-10 20:51:12.464 1448 ERROR nova.compute.manager [req-49950028-f747-4113-9fe0-3e5a5bbd7f37 - - - - -] No compute node record for host compute01

>>>compute01\logs\upstart\nova-compute.log (2 hits)

Line 10: 2017-03-10 20:51:01.252 1273 ERROR oslo.messaging._drivers.impl_rabbit [req-94ec5e5c-190e-4315-a092-94dc4a49c48d - - - - -] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 1 seconds.

Line 11: 2017-03-10 20:51:02.259 1273 ERROR oslo.messaging._drivers.impl_rabbit [req-94ec5e5c-190e-4315-a092-94dc4a49c48d - - - - -] AMQP server on 127.0.0.1:5672 is unreachable: [Errno 111] ECONNREFUSED. Trying again in 2 seconds.

>>> loadbalancer01\logs\glance-api.log (110 hits)

Line 282: 2017-03-10 20:51:54.897 10094 DEBUG oslo_db.sqlalchemy.engines [req-d9b1c15a-2a21-4737-a27c-41abaee4e192 30ad58c2a8c3434ca9c81f314a281991 f2d4ac6a851d4f4b902eb9eed718a60a - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256

Line 305: 2017-03-10 20:53:41.265 10093 DEBUG oslo_db.sqlalchemy.engines [req-4af23a9a-8e43-45a4-8e94-144bd3355ec5 30ad58c2a8c3434ca9c81f314a281991 f2d4ac6a851d4f4b902eb9eed718a60a - - -] MySQL server mode set to STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION _check_effective_sql_mode /usr/lib/python2.7/dist-packages/oslo_db/sqlalchemy/engines.py:256

Line 397: G2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data [req-bb2ca00e-2352-481b-98ea-25f34d119fed 30ad58c2a8c3434ca9c81f314a281991 f2d4ac6a851d4f4b902eb9eed718a60a - - -] Failed to upload image data due to internal error

Line 398: 2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data Traceback (most recent call last):

Line 399: 2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data File "/usr/lib/python2.7/dist-packages/glance/api/v2/image_data.py", line 114, in upload

Line 400: 2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data image.set_data(data, size)

Line 401: 2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data File "/usr/lib/python2.7/dist-packages/glance/domain/proxy.py", line 195, in set_data

Line 402: 2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data self.base.set_data(data, size)

... etc

Line 446: 2017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi [req-bb2ca00e-2352-481b-98ea-25f34d119fed 30ad58c2a8c3434ca9c81f314a281991 f2d4ac6a851d4f4b902eb9eed718a60a - - -]

Caught error: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f90cca71150>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH

Line 447: 2017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi Traceback (most recent call last):

Line 448: 2017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 902, in __call__

etc...

Line 504: 2017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi NewConnectionError: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f90cca71150>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH

>>>loadbalancer01\logs\haproxy.log (4 hits)

Line 34845: Mar 10 20:54:03 loadbalancer01 glance-api: 2017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data [req-bb2ca00e-2352-481b-98ea-25f34d119fed 30ad58c2a8c3434ca9c81f314a281991 f2d4ac6a851d4f4b902eb9eed718a60a - - -] Failed to upload image data due to internal error

#0122017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data Traceback (most recent call last):

#0122017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data File "/usr/lib/python2.7/dist-packages/glance/api/v2/image_data.py", line 114, in upload

#0122017-03-10 20:53:54.788 10093 ERROR glance.api.v2.image_data image.set_data(data, size)

Line 34847: Mar 10 20:54:03 loadbalancer01 glance-api: 2017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi [req-bb2ca00e-2352-481b-98ea-25f34d119fed 30ad58c2a8c3434ca9c81f314a281991 f2d4ac6a851d4f4b902eb9eed718a60a - - -]

Caught error: <requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f90cca71150>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH

#0122017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi Traceback (most recent call last):

#0122017-03-10 20:54:03.434 10093 ERROR glance.common.wsgi File "/usr/lib/python2.7/dist-packages/glance/common/wsgi.py", line 902, in __call__

etc

>>>vio_mgmt\logs\jarvis\jarvis.log (1 hit)

Line 6054: 2017-03-10 20:54:05,581 ERROR [jarvis.ans.task][Thread-2] task-3f84b40a-a17d-4f43-aeab-f5c167dbcb9f failed. Task failed on the following nodes: ['10.100.20.180']. Refer logs for more details

Here is the info on the deployment.

root@localhost:~# viocli show -i

[all:vars]

multi_vc = False

deployment_version = 3.1.0

deployment_type = singlevm

region_name = nova

default_availability_zone = nova

cinder_volumes = "{\"nova:10.100.20.203\": {\"vcenter_ip\": \"10.100.20.203\", \"vcenter_insecure\": \"true\", \"vcenter_user\": \"administrator@vsphere.local\", \"cluster_name_list\": [\"MOS_cluster2\"], \"availability_zone_name\": \"nova\"}}"

internal_vip = 10.100.20.180

public_hostname = 10.20.0.179

public_vip = 10.20.0.179

glance_datastores = hsh:MOS_datastore:100

vcenter_ip = 10.100.20.203

vcenter_insecure = True

dvs_default_interface_name = eth2

token_expiration_time = 7200

admin_tenant_name = admin

cinder_folder = Volumes

vcenter_user = administrator@vsphere.local

glance_folder = /images

admin_user = admin

management_default_gateway = 10.100.20.202

neutron_backend = dvs

horizon_regions = "AVAILABLE_REGIONS = [('http://10.100.20.180:5000/v3','VIO'),]"

dvs_default_name = dvSwitch

keystone_backend = sql

dvs_integration_bridge = br-dvs

deployment_name = VIO

[compute]

10.100.20.181 datastore_regex="MOS\_datastore2" vcenter_ip=10.100.20.203 hostname=compute01 cluster_name=MOS_cluster2 vcenter_user=administrator@vsphere.local cluster_moid=domain-c818 vcenter_insecure=True vcenter_uuid=51C0A55D-3EF7-4753-8E0D-0E251A88DB57

[lb] 10.100.20.180 hostname=loadbalancer01

[controller] 10.100.20.180 hostname=loadbalancer01

[memcache] 10.100.20.180 hostname=loadbalancer01

[storage] 10.100.20.180 hostname=loadbalancer01

[db] 10.100.20.180 hostname=loadbalancer01

[mq] 10.100.20.180 hostname=loadbalancer01

[dhcp] 10.100.20.180 hostname=loadbalancer01

I have tried "viocli deployment configure" after it failed, it went well to the end when it failed on same OVF deploy template"

Same with "viocli deployment post-deploy"

Attaching the scrubbed-passwd "vioclo deployment getlogs"