Hi,
I am trying to configure Ceilometer in VMware Integrated openstack 2.0 but the same is getting failed in 51 %.
From the ansible logs under Jarvis on the VIO management,
I can see some fatal errors "SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue.
FATAL: no hosts matched or all hosts have already failed -- aborting.
Please suggest some information on how to successfully configure Ceilometer.
Thanks
Ratnajit
hi,
This is also caused by slow environment, you can reduce mongodb to one node, or extend timeout of previous task.
To reduce mongodb to one node
1. go to file /opt/vmware/vio/etc/omjs.properties
oms.nodes.number.mongodb = 3
change it to 1
2. restart oms
run "restart oms"
hello Ratnajit,
the ssh connection issue most time caused by networking issues, could you please check whether there is dupped IP configured? can you ping the ceilometer node from VIO manager when the issue happened? thanks.
Hi Steven,
I have checked and found that all the Hosts which are part of the management cluster of VIO are running SSH client/Server. All the IPs assigned for the management network are unique. When the MongoDB vms were created by configuration process of Ceilometer I found I was able to ssh from management VIO to all these three VMs. All the VMs are deleted on failure at 51 % so we have a small time to check the SSH from the MongoDB VMs. All these VMs are pingable from the mangement server when they are created during the configuration process.
It is not possible to ssh from Memcache DB VMs to the MongoDB VMs this is my observation. All the other management VMs are able to SSH to Memcache DB VMs. It says permission denied. It seems that the keys are not exchanged between them.
The challenge is - What kind of configuration can I do as all the VMs are created at runtime and then deleted on failure of Ceilometer deployment? I have around 5-10 minutes and then the VMs get deleted.
Thanks
Ratnajit
Hi Steven,
Can you please provide some information on how to proceed some additional information will be really helpful.
THanks
Ratnajit
Hi RatnajitHCL,
Can you please upload the oms log ?
Please run "viocli deployment getlogs" on management server to collect all the logs.
Thanks
Yixing .
Please run "viocli deployment getlogs" to upload the log. Thanks.
the oms.log.1 don't have much message in it.
It seems the environment is too slow,
2016-05-02 06:06:42,236 p=569 u=jarvis | TASK: [mongodb | wait for mongodb server to start] ****************************
2016-05-02 06:07:23,518 p=569 u=jarvis | ok: [10.110.50.84]
2016-05-02 06:11:42,738 p=569 u=jarvis | failed: [10.110.50.86] => {"elapsed": 300, "failed": true}
2016-05-02 06:11:42,739 p=569 u=jarvis | msg: Timeout when waiting for 127.0.0.1:27017
2016-05-02 06:11:42,778 p=569 u=jarvis | failed: [10.110.50.85] => {"elapsed": 300, "failed": true}
2016-05-02 06:11:42,778 p=569 u=jarvis | msg: Timeout when waiting for 127.0.0.1:27017
2016-05-02 06:11:42,787 p=569 u=jarvis | FATAL: all hosts have already failed -- aborting
We need to extend the timeout to allow mongodb task to complete, the default timeout is 300, how about change timeout from 300 to 900?
locate file on OMS
/var/lib/vio/ansible/roles/mongodb/tasks/main.yml
change
- name: wait for mongodb server to start
wait_for:
port: 27017
tags:
- config
to
- name: wait for mongodb server to start
wait_for:
port: 27017
timeout: 900
tags:
- config
there are two occurrences on the files, and enable ceilometer again, see if it works.
You mongodb task succeeded, but failed on another task due to timeout, you can check progress at /var/log/jarvis/ansible.log, see which task failed.
2016-05-12 11:19:05,007 p=570 u=jarvis | ok: [10.110.50.75]
2016-05-12 11:19:05,016 p=570 u=jarvis | TASK: [config-local | create an admin tenant] *********************************
2016-05-12 11:19:05,542 p=570 u=jarvis | ok: [10.110.50.74]
2016-05-12 11:19:05,542 p=570 u=jarvis | TASK: [config-local | create a cloud admin user] ******************************
2016-05-12 11:19:06,692 p=570 u=jarvis | ok: [10.110.50.74]
2016-05-12 11:19:06,692 p=570 u=jarvis | TASK: [config-local | grant admin role to admin user on cloud tenant] *********
2016-05-12 11:19:07,255 p=570 u=jarvis | ok: [10.110.50.74]
2016-05-12 11:19:07,255 p=570 u=jarvis | TASK: [config-local | grant heat_stack_owner role to admin user on service tenant] ***
2016-05-12 11:19:07,808 p=570 u=jarvis | ok: [10.110.50.74]
2016-05-12 11:19:07,809 p=570 u=jarvis | TASK: [config-local | download stream-optimized image] ************************
2016-05-12 11:22:19,263 p=570 u=jarvis | failed: [10.110.50.74] => {"failed": true}
2016-05-12 11:22:19,333 p=570 u=jarvis | msg: failed to create temporary content file: timed out
2016-05-12 11:22:19,333 p=570 u=jarvis | FATAL: all hosts have already failed -- aborting
you need to extend the timeout on this task also.
locate file on OMS
/var/lib/vio/ansible/roles/config-local/tasks/main.xml
change
- name: download stream-optimized image
get_url:
url: "http://{{ imageserver }}/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"
dest: /tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk
run_once: true
- name: import image into glance
glance_image:
auth_url: "{{ auth_url }}"
login_tenant_name: "{{ admin_tenant_name }}"
login_username: "{{ admin_user }}"
login_password: "{{ admin_password }}"
name: "{{ image_name | default('ubuntu-14.04-server-amd64') }}"
disk_format: vmdk
# TODO(browne): Ansible glance_image module ignores these. Fix upstream
# min_ram: "{{ image_min_ram | default(512) }}"
# min_disk: "{{ image_min_disk | default(5) }}"
file: "/tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"
is_public: True
timeout: 1800
endpoint_type: internalURL
run_once: true
to
- name: download stream-optimized image
get_url:
url: "http://{{ imageserver }}/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"
dest: /tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk
timeout: 30
run_once: true
- name: import image into glance
glance_image:
auth_url: "{{ auth_url }}"
login_tenant_name: "{{ admin_tenant_name }}"
login_username: "{{ admin_user }}"
login_password: "{{ admin_password }}"
name: "{{ image_name | default('ubuntu-14.04-server-amd64') }}"
disk_format: vmdk
# TODO(browne): Ansible glance_image module ignores these. Fix upstream
# min_ram: "{{ image_min_ram | default(512) }}"
# min_disk: "{{ image_min_disk | default(5) }}"
file: "/tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"
is_public: True
timeout: 3600
endpoint_type: internalURL
run_once: true
Hi ZhangAdam,
I am attaching new logs and I can see new errors.
2016-05-13 08:21:42,491 p=568 u=jarvis | TASK: [mongodb | ensure mongodb admin user is present] ************************
2016-05-13 08:21:43,641 p=568 u=jarvis | failed: [10.110.50.84] => (item=%) => {"failed": true, "item": "%", "parsed": false}
2016-05-13 08:21:43,641 p=568 u=jarvis | SUDO-SUCCESS-nxoqtweglscdaonhjmjzsobhaxdtuqrf
Traceback (most recent call last):
File "/home/viouser/.ansible/tmp/ansible-tmp-1463127702.9-8027070333158/mongodb_user", line 1817, in <module>
main()
File "/home/viouser/.ansible/tmp/ansible-tmp-1463127702.9-8027070333158/mongodb_user", line 238, in main
user_add(module, client, db_name, user, password, roles)
File "/home/viouser/.ansible/tmp/ansible-tmp-1463127702.9-8027070333158/mongodb_user", line 142, in user_add
db.add_user(user, password, None, roles=roles)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 871, in add_user
(not uinfo["users"]), name, password, read_only, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 793, in _create_or_update_user
self.command(command_name, name, **opts)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 454, in command
codec_options, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 366, in _command
allowable_errors)
File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 189, in command
self._raise_connection_failure(error)
File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 316, in _raise_connection_failure
raise error
pymongo.errors.NotMasterError: not master
2016-05-13 08:21:44,549 p=568 u=jarvis | failed: [10.110.50.84] => (item=localhost) => {"failed": true, "item": "localhost", "parsed": false}
2016-05-13 08:21:44,549 p=568 u=jarvis | SUDO-SUCCESS-vqkjvosxrzcbxaxnkcedbeyyqhjcelqe
Traceback (most recent call last):
File "/home/viouser/.ansible/tmp/ansible-tmp-1463127703.93-8740217426243/mongodb_user", line 1817, in <module>
main()
File "/home/viouser/.ansible/tmp/ansible-tmp-1463127703.93-8740217426243/mongodb_user", line 238, in main
user_add(module, client, db_name, user, password, roles)
File "/home/viouser/.ansible/tmp/ansible-tmp-1463127703.93-8740217426243/mongodb_user", line 142, in user_add
db.add_user(user, password, None, roles=roles)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 871, in add_user
(not uinfo["users"]), name, password, read_only, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 793, in _create_or_update_user
self.command(command_name, name, **opts)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 454, in command
codec_options, **kwargs)
File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 366, in _command
allowable_errors)
File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 189, in command
self._raise_connection_failure(error)
File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 316, in _raise_connection_failure
raise error
pymongo.errors.NotMasterError: not master
Please assist to resolve the issue.
Thanks
Ratnajit
hi,
This is also caused by slow environment, you can reduce mongodb to one node, or extend timeout of previous task.
To reduce mongodb to one node
1. go to file /opt/vmware/vio/etc/omjs.properties
oms.nodes.number.mongodb = 3
change it to 1
2. restart oms
run "restart oms"
Hi ZhangAdam,
Thanks a lot for your time and helping to resolve the issue.
As suggested I went ahead with reducing the number of nodes to 1 and then installing. Finally, the Ceilometer is installed and shows enabled.
Many Thanks
Ratnajit