VMware Cloud Community
el_mehdi_ora
Contributor
Contributor

Failed to deploy VIO 3.0 HA Mode

Hi,


I inform you that I have deploy again the HA mode, but it gave me the same error, and the ansible.log file, I found these logs following this erroneous deployment:

2016-09-28 09:35:33,421 p=19127 u=jarvis |  PLAY [localhost] **************************************************************

2016-09-28 09:35:33,421 p=19127 u=jarvis | GATHERING FACTS ***************************************************************

2016-09-28 09:35:33,893 p=19127 u=jarvis |  ok: [localhost]

2016-09-28 09:35:33,894 p=19127 u=jarvis |  TASK: [Clear ARP cache] *******************************************************

2016-09-28 09:35:33,958 p=19127 u=jarvis |  changed: [localhost]

2016-09-28 09:35:33,959 p=19127 u=jarvis |  PLAY [all:!localhost] *********************************************************

2016-09-28 09:35:33,960 p=19127 u=jarvis |  TASK: [Verify SSH connectivity with every VIO node] ***************************

2016-09-28 09:35:37,179 p=19127 u=jarvis |  ok: [192.168.240.90 -> 127.0.0.1]

2016-09-28 09:35:37,404 p=19127 u=jarvis |  ok: [192.168.240.84 -> 127.0.0.1]

2016-09-28 09:35:37,430 p=19127 u=jarvis |  ok: [192.168.240.91 -> 127.0.0.1]

2016-09-28 09:35:37,435 p=19127 u=jarvis |  ok: [192.168.240.87 -> 127.0.0.1]

2016-09-28 09:35:37,437 p=19127 u=jarvis |  ok: [192.168.240.85 -> 127.0.0.1]

2016-09-28 09:35:37,484 p=19127 u=jarvis |  ok: [192.168.240.88 -> 127.0.0.1]

2016-09-28 09:40:35,462 p=19127 u=jarvis |  failed: [192.168.240.86 -> 127.0.0.1] => {"elapsed": 301, "failed": true}

2016-09-28 09:40:35,463 p=19127 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 192.168.240.86:22

2016-09-28 09:40:35,465 p=19127 u=jarvis |  failed: [192.168.240.89 -> 127.0.0.1] => {"elapsed": 301, "failed": true}

2016-09-28 09:40:35,465 p=19127 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 192.168.240.89:22

2016-09-28 09:40:35,475 p=19127 u=jarvis |  FATAL: all hosts have already failed -- aborting

NB : 192.168.240.86  VIO-LoadBalancer-0    /  192.168.240.89  VIO-DB-2 I confirm that both node are on the same host, and I checked this ssh host activated, and the firewall of the host all the IP allowed

Reply
0 Kudos
4 Replies
ZhangAdam
VMware Employee
VMware Employee

Hi,

Did you ssh to the two vm  before?

Maybe you can clean the .ssh/know_hosts, and increase the timeout, and try again.

/var/lib/vio/ansible/site.yml

edit line:

  tasks:

    - name: Verify SSH connectivity with every VIO node

      local_action: wait_for port=22 host="{{ inventory_hostname }}" search_regex=OpenSSH delay=3

Add timeout parameters to the task(default value is 300),  for example:

  tasks:

    - name: Verify SSH connectivity with every VIO node

      local_action: wait_for port=22 host="{{ inventory_hostname }}" search_regex=OpenSSH delay=3 timeout=600

Best Regards

Adam

Reply
0 Kudos
el_mehdi_ora
Contributor
Contributor

Hi Adam,

Thank you for your feedback and for your help and support, and I inform you that I followed your recommendations, but in vain, always the same error but only on one VIO-DB02 node,

2016-09-29 09:41:20,264 p=19127 u=jarvis |  PLAY [localhost] **************************************************************
2016-09-29 09:41:20,264 p=19127 u=jarvis |  GATHERING FACTS ***************************************************************
2016-09-29 09:41:20,596 p=19127 u=jarvis |  ok: [localhost]
2016-09-29 09:41:20,597 p=19127 u=jarvis |  TASK: [Clear ARP cache] *******************************************************
2016-09-29 09:41:20,660 p=19127 u=jarvis |  changed: [localhost]
2016-09-29 09:41:20,661 p=19127 u=jarvis |  PLAY [all:!localhost] *********************************************************
2016-09-29 09:41:20,662 p=19127 u=jarvis |  TASK: [Verify SSH connectivity with every VIO node] ***************************
2016-09-29 09:41:24,031 p=19127 u=jarvis |  ok: [192.168.240.90 -> 127.0.0.1]
2016-09-29 09:41:24,074 p=19127 u=jarvis |  ok: [192.168.240.91 -> 127.0.0.1]
2016-09-29 09:41:24,105 p=19127 u=jarvis |  ok: [192.168.240.87 -> 127.0.0.1]
2016-09-29 09:41:24,107 p=19127 u=jarvis |  ok: [192.168.240.89 -> 127.0.0.1]
2016-09-29 09:41:24,109 p=19127 u=jarvis |  ok: [192.168.240.88 -> 127.0.0.1]
2016-09-29 09:41:24,120 p=19127 u=jarvis |  ok: [192.168.240.84 -> 127.0.0.1]
2016-09-29 09:41:24,122 p=19127 u=jarvis |  ok: [192.168.240.85 -> 127.0.0.1]
2016-09-29 09:51:22,090 p=19127 u=jarvis |  failed: [192.168.240.86 -> 127.0.0.1] => {"elapsed": 601, "failed": true}
2016-09-29 09:51:22,091 p=19127 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 192.168.240.86:22
2016-09-29 09:51:22,102 p=19127 u=jarvis |  FATAL: all hosts have already failed -- aborting

Reply
0 Kudos
Sarwankumar
Contributor
Contributor

Hi, I faced the same issue, even though I followed to edit the mentioned file /var/lib/vio/ansible/site.yml by adding timeout=600

viouser@vio:/var/log/jarvis$ tail -f ansible.log

2017-02-18 10:09:28,430 p=358 u=jarvis |  ok: [10.6.131.91 -> 127.0.0.1]

2017-02-18 10:09:28,433 p=358 u=jarvis |  ok: [10.6.131.83 -> 127.0.0.1]

2017-02-18 10:09:28,441 p=358 u=jarvis |  ok: [10.6.131.88 -> 127.0.0.1]

2017-02-18 10:14:25,391 p=358 u=jarvis |  failed: [10.6.131.90 -> 127.0.0.1] => {"elapsed": 300, "failed": true}

2017-02-18 10:14:25,391 p=358 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 10.6.131.90:22

2017-02-18 10:14:27,394 p=358 u=jarvis |  failed: [10.6.131.84 -> 127.0.0.1] => {"elapsed": 302, "failed": true}

2017-02-18 10:14:27,395 p=358 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 10.6.131.84:22

2017-02-18 10:14:27,419 p=358 u=jarvis |  failed: [10.6.131.87 -> 127.0.0.1] => {"elapsed": 302, "failed": true}

2017-02-18 10:14:27,419 p=358 u=jarvis |  msg: Timeout when waiting for search string OpenSSH in 10.6.131.87:22

2017-02-18 10:14:27,440 p=358 u=jarvis |  FATAL: all hosts have already failed -- aborting

Thanks

Sarwan

Reply
0 Kudos
Sarwankumar
Contributor
Contributor

Now it works for me: its interesting:

1- if you go on VM try to check network , try to ping other machine it won't alloy => reason => network is down on these VMs

2- even though u can't connect your any VM to this portgroup.

3- go to you esxi host check, and I am sure the network adapter is not well [Here was the issue in my case]

4-I changed the network adapter and connected my portgroup to this new adapter.

5- retry to VIO deployment,

Worked well for me without any issue


Thanks

Sarwan

Reply
0 Kudos