VMware Cloud Community
RatnajitHCL
Contributor
Contributor
Jump to solution

Ceilometer configuration is getting failed in 51 % in VMware IntegratedOpenstack

Hi,

I am trying to configure Ceilometer in VMware Integrated openstack 2.0 but the same is getting failed in 51 %.

From the ansible logs under Jarvis on the VIO management,

I can see some fatal errors "SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to  help diagnose the issue.

FATAL: no hosts matched or all hosts have already failed -- aborting.

Please suggest some information on how to successfully configure Ceilometer.

Thanks

Ratnajit

0 Kudos
1 Solution

Accepted Solutions
ZhangAdam
VMware Employee
VMware Employee
Jump to solution

hi,

This is also caused by slow environment, you can reduce mongodb to one node, or extend timeout of previous task.

To reduce mongodb to one node

1. go to file /opt/vmware/vio/etc/omjs.properties

oms.nodes.number.mongodb = 3

change it to 1

2. restart oms

run "restart oms"

View solution in original post

0 Kudos
13 Replies
admin
Immortal
Immortal
Jump to solution

hello Ratnajit,

     the ssh connection issue most time caused by networking issues, could you please check whether there is dupped IP configured? can you ping the ceilometer node from VIO manager when the issue happened? thanks.

RatnajitHCL
Contributor
Contributor
Jump to solution

Hi Steven,

I have checked and found that all the Hosts which are part of the management cluster of VIO are running SSH client/Server. All the IPs assigned for the management network are  unique. When the MongoDB vms were created by configuration process of Ceilometer I found I was able to ssh from management VIO to all these three VMs. All the VMs are deleted on failure at 51 % so we have a small time to check the SSH from the MongoDB VMs. All these VMs are pingable from the mangement server when they are created during the configuration process.

It is not possible to ssh from Memcache DB VMs to the MongoDB VMs this is my observation. All the other management VMs are able to SSH to Memcache DB VMs. It says permission denied. It seems that the keys are not exchanged between them.

The challenge is - What kind of configuration can I do as all the VMs are created at runtime and then deleted on failure of Ceilometer deployment? I have around 5-10 minutes and then the VMs get deleted.

Thanks

Ratnajit

0 Kudos
RatnajitHCL
Contributor
Contributor
Jump to solution

Hi Steven,

Can you please provide some information on how to proceed some additional information will be really helpful.

THanks

Ratnajit

0 Kudos
yjia
VMware Employee
VMware Employee
Jump to solution

Hi RatnajitHCL,


Can you please upload the oms log ?


Please run "viocli deployment getlogs" on management server  to collect all the logs.


Thanks

Yixing .

0 Kudos
RatnajitHCL
Contributor
Contributor
Jump to solution

Hi Yixing,

Please find the oms log attached.

Let me know if you are require any information. I shall revert to you.

Thanks

Ratnajit

0 Kudos
yjia
VMware Employee
VMware Employee
Jump to solution

Please run "viocli deployment getlogs" to upload the log. Thanks.

the oms.log.1 don't have much message in it.

0 Kudos
RatnajitHCL
Contributor
Contributor
Jump to solution

Hi Yixig,

Please find the complete log as attached. Let me know if you require any other information.

Thanks

Ratnajit

0 Kudos
ZhangAdam
VMware Employee
VMware Employee
Jump to solution

It  seems the environment is too slow,

2016-05-02 06:06:42,236 p=569 u=jarvis |  TASK: [mongodb | wait for mongodb server to start] ****************************

2016-05-02 06:07:23,518 p=569 u=jarvis |  ok: [10.110.50.84]

2016-05-02 06:11:42,738 p=569 u=jarvis |  failed: [10.110.50.86] => {"elapsed": 300, "failed": true}

2016-05-02 06:11:42,739 p=569 u=jarvis |  msg: Timeout when waiting for 127.0.0.1:27017

2016-05-02 06:11:42,778 p=569 u=jarvis |  failed: [10.110.50.85] => {"elapsed": 300, "failed": true}

2016-05-02 06:11:42,778 p=569 u=jarvis |  msg: Timeout when waiting for 127.0.0.1:27017

2016-05-02 06:11:42,787 p=569 u=jarvis |  FATAL: all hosts have already failed -- aborting

We need to extend the timeout to allow mongodb task to complete,  the default timeout is 300, how about change timeout from 300 to 900?

locate file on OMS

/var/lib/vio/ansible/roles/mongodb/tasks/main.yml

change

- name: wait for mongodb server to start

  wait_for:

    port: 27017

  tags:

    - config

to

- name: wait for mongodb server to start

  wait_for:

    port: 27017

    timeout: 900

  tags:

    - config

there are two occurrences  on the files, and enable ceilometer again, see if it works.

RatnajitHCL
Contributor
Contributor
Jump to solution

Hi ZhangAdam,

Thanks for your reply. I have made the changes as suggested but Ceilometer failed to get configured.

Please find the updated logs as attached.

Please let me know if any other changes need to be done.

Thanks

Ratnajit

0 Kudos
ZhangAdam
VMware Employee
VMware Employee
Jump to solution

You mongodb task succeeded, but failed on another task due to timeout, you can check progress at /var/log/jarvis/ansible.log, see which task failed.

2016-05-12 11:19:05,007 p=570 u=jarvis |  ok: [10.110.50.75]

2016-05-12 11:19:05,016 p=570 u=jarvis |  TASK: [config-local | create an admin tenant] *********************************

2016-05-12 11:19:05,542 p=570 u=jarvis |  ok: [10.110.50.74]

2016-05-12 11:19:05,542 p=570 u=jarvis |  TASK: [config-local | create a cloud admin user] ******************************

2016-05-12 11:19:06,692 p=570 u=jarvis |  ok: [10.110.50.74]

2016-05-12 11:19:06,692 p=570 u=jarvis |  TASK: [config-local | grant admin role to admin user on cloud tenant] *********

2016-05-12 11:19:07,255 p=570 u=jarvis |  ok: [10.110.50.74]

2016-05-12 11:19:07,255 p=570 u=jarvis |  TASK: [config-local | grant heat_stack_owner role to admin user on service tenant] ***

2016-05-12 11:19:07,808 p=570 u=jarvis |  ok: [10.110.50.74]

2016-05-12 11:19:07,809 p=570 u=jarvis |  TASK: [config-local | download stream-optimized image] ************************

2016-05-12 11:22:19,263 p=570 u=jarvis |  failed: [10.110.50.74] => {"failed": true}

2016-05-12 11:22:19,333 p=570 u=jarvis |  msg: failed to create temporary content file: timed out

2016-05-12 11:22:19,333 p=570 u=jarvis |  FATAL: all hosts have already failed -- aborting

you need to extend the timeout on this task also.

locate file on OMS

/var/lib/vio/ansible/roles/config-local/tasks/main.xml

change

- name: download stream-optimized image

  get_url:

    url: "http://{{ imageserver }}/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"

    dest: /tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk

  run_once: true

- name: import image into glance

  glance_image:

    auth_url: "{{ auth_url }}"

    login_tenant_name: "{{ admin_tenant_name }}"

    login_username: "{{ admin_user }}"

    login_password: "{{ admin_password }}"

    name: "{{ image_name | default('ubuntu-14.04-server-amd64') }}"

    disk_format: vmdk

# TODO(browne): Ansible glance_image module ignores these.  Fix upstream

#    min_ram: "{{ image_min_ram | default(512) }}"

#    min_disk: "{{ image_min_disk | default(5) }}"

    file: "/tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"

    is_public: True

    timeout: 1800

    endpoint_type: internalURL

  run_once: true

to

- name: download stream-optimized image

  get_url:

    url: "http://{{ imageserver }}/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"

    dest: /tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk

    timeout: 30

  run_once: true

- name: import image into glance

  glance_image:

    auth_url: "{{ auth_url }}"

    login_tenant_name: "{{ admin_tenant_name }}"

    login_username: "{{ admin_user }}"

    login_password: "{{ admin_password }}"

    name: "{{ image_name | default('ubuntu-14.04-server-amd64') }}"

    disk_format: vmdk

# TODO(browne): Ansible glance_image module ignores these.  Fix upstream

#    min_ram: "{{ image_min_ram | default(512) }}"

#    min_disk: "{{ image_min_disk | default(5) }}"

    file: "/tmp/{{ image_name | default('ubuntu-14.04-server-amd64') }}.vmdk"

    is_public: True

    timeout: 3600

    endpoint_type: internalURL

  run_once: true

RatnajitHCL
Contributor
Contributor
Jump to solution

Hi ZhangAdam,

I am attaching new logs and I can see new errors.

2016-05-13 08:21:42,491 p=568 u=jarvis |  TASK: [mongodb | ensure mongodb admin user is present] ************************
2016-05-13 08:21:43,641 p=568 u=jarvis |  failed: [10.110.50.84] => (item=%) => {"failed": true, "item": "%", "parsed": false}
2016-05-13 08:21:43,641 p=568 u=jarvis |  SUDO-SUCCESS-nxoqtweglscdaonhjmjzsobhaxdtuqrf
Traceback (most recent call last):
  File "/home/viouser/.ansible/tmp/ansible-tmp-1463127702.9-8027070333158/mongodb_user", line 1817, in <module>
    main()
  File "/home/viouser/.ansible/tmp/ansible-tmp-1463127702.9-8027070333158/mongodb_user", line 238, in main
    user_add(module, client, db_name, user, password, roles)
  File "/home/viouser/.ansible/tmp/ansible-tmp-1463127702.9-8027070333158/mongodb_user", line 142, in user_add
    db.add_user(user, password, None, roles=roles)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 871, in add_user
    (not uinfo["users"]), name, password, read_only, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 793, in _create_or_update_user
    self.command(command_name, name, **opts)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 454, in command
    codec_options, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 366, in _command
    allowable_errors)
  File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 189, in command
    self._raise_connection_failure(error)
  File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 316, in _raise_connection_failure
    raise error
pymongo.errors.NotMasterError: not master

2016-05-13 08:21:44,549 p=568 u=jarvis |  failed: [10.110.50.84] => (item=localhost) => {"failed": true, "item": "localhost", "parsed": false}
2016-05-13 08:21:44,549 p=568 u=jarvis |  SUDO-SUCCESS-vqkjvosxrzcbxaxnkcedbeyyqhjcelqe
Traceback (most recent call last):
  File "/home/viouser/.ansible/tmp/ansible-tmp-1463127703.93-8740217426243/mongodb_user", line 1817, in <module>
    main()
  File "/home/viouser/.ansible/tmp/ansible-tmp-1463127703.93-8740217426243/mongodb_user", line 238, in main
    user_add(module, client, db_name, user, password, roles)
  File "/home/viouser/.ansible/tmp/ansible-tmp-1463127703.93-8740217426243/mongodb_user", line 142, in user_add
    db.add_user(user, password, None, roles=roles)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 871, in add_user
    (not uinfo["users"]), name, password, read_only, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 793, in _create_or_update_user
    self.command(command_name, name, **opts)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 454, in command
    codec_options, **kwargs)
  File "/usr/lib/python2.7/dist-packages/pymongo/database.py", line 366, in _command
    allowable_errors)
  File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 189, in command
    self._raise_connection_failure(error)
  File "/usr/lib/python2.7/dist-packages/pymongo/pool.py", line 316, in _raise_connection_failure
    raise error
pymongo.errors.NotMasterError: not master

Please assist to resolve the issue.

Thanks

Ratnajit

0 Kudos
ZhangAdam
VMware Employee
VMware Employee
Jump to solution

hi,

This is also caused by slow environment, you can reduce mongodb to one node, or extend timeout of previous task.

To reduce mongodb to one node

1. go to file /opt/vmware/vio/etc/omjs.properties

oms.nodes.number.mongodb = 3

change it to 1

2. restart oms

run "restart oms"

0 Kudos
RatnajitHCL
Contributor
Contributor
Jump to solution

Hi ZhangAdam,

Thanks a lot for your time and helping to resolve the issue.

As suggested I went ahead with reducing the number of nodes to 1 and then installing. Finally, the Ceilometer is installed and shows enabled.

Many Thanks

Ratnajit

0 Kudos