VMware Cloud Community
el_mehdi_ora
Contributor
Contributor

VIO 3.0 Failed to deploy HA Mode

Hello everybody,

I wanted to deploy VIO3.0 in HA mode, but I got a deployment error as shown herewith and after I modified the /var/lib/vio/ansible/site.yml file to add the parameter:
local_action: wait_for port = 22 host = "{{}} inventory_hostname" search_regex = OpenSSH delay = 3 timeout = 600

and I checked the connectivity it's ok, oms service is started, and the datastore is normal, but I still not deployed in HA.


I inform you that I have deploy again the HA mode, but it gave me the same error, and the ansible.log file, I found these logs following this erroneous deployment:

2016-09-28 09:35:33,421 p=19127 u=jarvis |  PLAY [localhost] **************************************************************

2016-09-28 09:35:33,421 p=19127 u=jarvis | GATHERING FACTS ***************************************************************

2016-09-28 09:35:33,893 p=19127 u=jarvis |  ok: [localhost]

2016-09-28 09:35:33,894 p=19127 u=jarvis |  TASK: [Clear ARP cache] *******************************************************

2016-09-28 09:35:33,958 p=19127 u=jarvis |  changed: [localhost]

2016-09-28 09:35:33,959 p=19127 u=jarvis |  PLAY [all:!localhost] *********************************************************

2016-09-28 09:35:33,960 p=19127 u=jarvis |  TASK: [Verify SSH connectivity with every VIO node] ***************************

2016-09-28 09:35:37,179 p=19127 u=jarvis |  ok: [192.168.40.90 -> 127.0.0.1]

2016-09-28 09:35:37,404 p=19127 u=jarvis |  ok: [192.168.40.84 -> 127.0.0.1]

2016-09-28 09:35:37,430 p=19127 u=jarvis |  ok: [192.168.40.91 -> 127.0.0.1]

2016-09-28 09:35:37,435 p=19127 u=jarvis |  ok: [192.168.40.87 -> 127.0.0.1]

2016-09-28 09:35:37,437 p=19127 u=jarvis |  ok: [192.168.40.85 -> 127.0.0.1]

2016-09-28 09:35:37,484 p=19127 u=jarvis |  ok: [192.168.40.88 -> 127.0.0.1]

2016-09-28 09:40:35,462 p=19127 u=jarvis |  failed: [192.168.40.86 -> 127.0.0.1] => {"elapsed": 301, "failed": true}

2016-09-28 09:40:35,463 p=19127 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in http://192.168.240.86:22/

2016-09-28 09:40:35,465 p=19127 u=jarvis |  failed: [192.168.40.89 -> 127.0.0.1] => {"elapsed": 301, "failed": true}

2016-09-28 09:40:35,465 p=19127 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 192.168.40.89:22

2016-09-28 09:40:35,475 p=19127 u=jarvis |  FATAL: all hosts have already failed -- aborting

0 Kudos
5 Replies
lserpietri
Enthusiast
Enthusiast

what was the error when you first tried to deploy VIO? did you modify site.yml before that?

0 Kudos
el_mehdi_ora
Contributor
Contributor

Hi,


I had the same error, even before changing site.yaml file, and I modified it to resolve the problem, but still I can not deployed VIO HA mode.

0 Kudos
lserpietri
Enthusiast
Enthusiast

Can you try to revert the changes you made, try to deploy again and share the logs? It looks like a connectivity issue from OMS...

0 Kudos
el_mehdi_ora
Contributor
Contributor

Hi,

I inform you that I deleted the changes to the file site.yaml, and I redid the deployment, always made the same mistake, and attached the logs to get an idea.

0 Kudos
lserpietri
Enthusiast
Enthusiast

from what I see it looks like the IP configuration is pushed correctly to the DB0 node. Ansible fails to ssh into it so I would look at the DB0 node to check if the operating system is actually reporting the correct IP.

Then I would move to look at your networking configuration: are all the uplinks for the DVS setup correctly? Are the portgroups configured correctly? has your physical network undergone to some changes lately? (cfr VMware Integrated OpenStack Information)

also, looks like all VIO nodes are getting deployed on "Local ESXi 0x" datastores: I'm assuming those are local datastores for the ESXis and that's not great as you won't be able to protect the VIO nodes with Cluster features such as HA. did you modify the omjs.properties file?

0 Kudos