Re: Error : No valid host was found. There are not...

jgover · ‎12-16-2016

Hi,

Our DevTeams can't create VMs anymore from the VIO Horizon Dashboard. They receive this error: "Error : No valid host was found. There are not enough hosts available. Code 500"

I searched this seems like a generic Openstack error. I am new to Openstack and VIO I just inherited the environment. What logs should I start looking for relevant errors?

The Compute node does not show anything significant in the logs to me.

[root@vmwarelab1:/var/log] ls

Xorg.log jumpstart-stdout.log vmauthd.log

auth.log lacp.log vmkdevmgr.log

boot.gz nfcd.log vmkernel.log

clomd.log osfsd.log vmkeventd.log

configRP.log rabbitmqproxy.log vmksummary.log

dhclient.log rhttpproxy.log vmkwarning.log

epd.log sdrsinjector.log vmware

esxcli.log shell.log vobd.log

esxupdate.log smbios.bin vprobe.log

fdm.log storagerm.log vprobed.log

hostd-probe.log swapobjd.log vpxa.log

hostd.log sysboot.log vsanvpd.log

hostprofiletrace.log syslog.log vvold.log

iofiltervpd.log tallylog

ipmi usb.log

Thanks

rpellet · ‎12-16-2016

Those are the host logs for the esxi host. You want to log into the VIO management server first and then ssh into one of the controllers. /var/log/nova/nova-api.log and }/var/log/nova/nova-scheduler.log would be helpful.

jgover · ‎12-16-2016

Hi,

OK Thanks for that.

I tried building an instance while tailing the nova-compute.log "did not match any datastores" stands out to me. I asked the Team if they renamed the datastores or changed any of the datastores. I also went through vSphere logs looking for modification of the datastores and did not find anything.

I came across this TID but they said the datastores were always that name???

https://kb.vmware.com/kb/2147307

Do you have any more ideas? Am I on the right track now?

Thanks

Jeff

2016-12-16 21:14:44.944 3282 DEBUG nova.compute.claims [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] [instance: d6ec55bb-957a-426e-806d-71900bc40ee3] Aborting claim: [Claim: 2048 MB memory, 20 GB disk] abort /usr/lib/python2.7/dist-packages/nova/compute/claims.py:130

2016-12-16 21:14:44.945 3282 DEBUG oslo_concurrency.lockutils [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] Lock "compute_resources" acquired by "abort_instance_claim" :: waited 0.000s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:444

2016-12-16 21:14:44.976 3282 INFO nova.scheduler.client.report [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] Compute_service record updated for ('compute01', resgroup-328(VIO))

2016-12-16 21:14:44.977 3282 DEBUG oslo_concurrency.lockutils [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] Lock "compute_resources" released by "abort_instance_claim" :: held 0.032s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:456

2016-12-16 21:14:44.979 3282 DEBUG nova.compute.utils [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] [instance: d6ec55bb-957a-426e-806d-71900bc40ee3] Datastore regex datastore1\ \(3\)|datastore1\ \(2\)|datastore1\ \(1\) did not match any datastores notify_about_instance_usage /usr/lib/python2.7/dist-packages/nova/compute/utils.py:310

2016-12-16 21:14:44.981 3282 DEBUG nova.compute.manager [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] [instance: d6ec55bb-957a-426e-806d-71900bc40ee3] Build of instance d6ec55bb-957a-426e-806d-71900bc40ee3 was re-scheduled: Datastore regex datastore1\ \(3\)|datastore1\ \(2\)|datastore1\ \(1\) did not match any datastores _do_build_and_run_instance /usr/lib/python2.7/dist-packages/nova/compute/manager.py:2275

2016-12-16 21:14:45.122 3282 DEBUG oslo_concurrency.lockutils [req-a796912f-c9b2-4603-be61-9058b678a872 301cd76fa25547dca53d4033505d3f1c 7f69e085a9054c199b43b9c4eab1b4e9 - - -] Lock "d6ec55bb-957a-426e-806d-71900bc40ee3" released by "_locked_do_build_and_run_instance" :: held 1.846s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:456

2016-12-16 21:14:54.888 3282 DEBUG nova.openstack.common.periodic_task [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Running periodic task ComputeManager._poll_rebooting_instances run_periodic_tasks /usr/lib/python2.7/dist-packages/nova/openstack/common/periodic_task.py:219

2016-12-16 21:14:54.890 3282 DEBUG nova.openstack.common.loopingcall [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Dynamic looping call <bound method Service.periodic_tasks of <nova.service.Service object at 0x7f3051916550>> sleeping for 14.99 seconds _inner /usr/lib/python2.7/dist-packages/nova/openstack/common/loopingcall.py:132

2016-12-16 21:15:09.884 3282 DEBUG nova.openstack.common.periodic_task [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Running periodic task ComputeManager._sync_scheduler_instance_info run_periodic_tasks /usr/lib/python2.7/dist-packages/nova/openstack/common/periodic_task.py:219

2016-12-16 21:15:10.059 3282 DEBUG nova.openstack.common.loopingcall [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Dynamic looping call <bound method Service.periodic_tasks of <nova.service.Service object at 0x7f3051916550>> sleeping for 5.00 seconds _inner /usr/lib/python2.7/dist-packages/nova/openstack/common/loopingcall.py:132

2016-12-16 21:15:15.059 3282 DEBUG nova.openstack.common.periodic_task [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Running periodic task ComputeManager._check_instance_build_time run_periodic_tasks /usr/lib/python2.7/dist-packages/nova/openstack/common/periodic_task.py:219

2016-12-16 21:15:15.061 3282 DEBUG nova.openstack.common.loopingcall [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Dynamic looping call <bound method Service.periodic_tasks of <nova.service.Service object at 0x7f3051916550>> sleeping for 4.83 seconds _inner /usr/lib/python2.7/dist-packages/nova/openstack/common/loopingcall.py:132

2016-12-16 21:15:19.888 3282 DEBUG nova.openstack.common.periodic_task [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Running periodic task ComputeManager._reclaim_queued_deletes run_periodic_tasks /usr/lib/python2.7/dist-packages/nova/openstack/common/periodic_task.py:219

2016-12-16 21:15:19.889 3282 DEBUG nova.compute.manager [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] CONF.reclaim_instance_interval <= 0, skipping... _reclaim_queued_deletes /usr/lib/python2.7/dist-packages/nova/compute/manager.py:6297

2016-12-16 21:15:19.890 3282 DEBUG nova.openstack.common.loopingcall [req-f3442395-d611-4a15-9a09-24b0581d7599 - - - - -] Dynamic looping call <bound method Service.periodic_tasks of <nova.service.Service object at 0x7f3051916550>> sleeping for 1.99 seconds _inner /usr/lib/python2.7/dist-

gjayavelu · ‎12-17-2016

Can you rename the datastores with NO space in it and add them to nova datastores using the "add nova datastore" workflow on the webclient?

example:

datastore1 (3) to datastore13 or something similar.

gjayavelu · ‎12-17-2016

See "Add Storage to the Compute Node" section in http://pubs.vmware.com/integrated-openstack-3/topic/com.vmware.ICbase/PDF/integrated-openstack-3-ins...

to add datastore after renaming without spaces.

jgover · ‎12-21-2016

I am running 2.0, when I try to add storage:

"The operation is allowed only when the deployment is running or in a configuration error state"

Also this statement from the 2.5 and 3.0 documentation what will actually happen? Will the Instances/VMs stay up and running? I am in a wierd state that if I shudown a Instance/VM it will not start back up and give the same initial error as I started the thread with.

"Adding a datastore to the Compute node causes the Nova service to restart, which might cause a temporary

disruption to the OpenStack services in general."

Thanks

Jeff

lserpietri · ‎12-21-2016

Hi Jeff,

so you should check the status of your deployment, is it stopped?

The statement from the documentation is stating that during the restart operation of Nova, you will not be able to interact with the service (ex. launch new instances, etc...) but existing VMs are not affected by this operation.

Hope this helps!

jgover · ‎12-21-2016

Hi,

The plot thickens as I dig deeper with your tasks send me on.

So what I found is this: I searched this and it relates to failed deployments, patches or upgrades. But remember its up and running we just cannot provision or restart an Instance.

Also does this screen mean it still thinks it's in deployment limbo? What would happen if I click "Stop Openstack Deployment" here?

Here is the status: I think a better question is what command line utilities can I check to validate this environment? Or as you stated "should check the status of your deployment," Like (openstack-status)_ on Redhat?

Thanks

Jeff

gjayavelu · ‎12-22-2016

Looks like something wrong on vcenter. I'm guessing datastore related. Please upload vio log bundle.

ssh to openstack management server.

sudo su

viocli deployment getlogs

jgover · ‎12-22-2016

Errors:

"[instance: d6ec55bb-957a-426e-806d-71900bc40ee3] Datastore regex datastore1\ \(3\)|datastore1\ \(2\)|datastore1\ \(1\) did not match any datastores notify_about_instance_usage /usr/lib/python2.7/dist-packages/nova/compute/utils.py:310"

As I investigate further this environment was setup originally with one host maybe two hosts and now a total of 4 hosts BUT with no Shared Storage. Apparently the environment was running fine up until DevOPs came to me this week for about 1.5 years. P.S. I know looking through the requirements "Shared Storage is a requirement" but how could this have worked for this long and how did it even work in the first place?

So with that being said, I attached a NAS NFS datastore to the network and added a datastore to vSphere/Hosts environment. I tried to add the datastore for Nova Storage and I get the same errors:

"The operation is allowed only when the deployment is running or in a configuration error state"

My plan was to setup a Shared Storage environment; backup and then, migrate all VMs and Instances from the Local VMFS datastores to the new VMFS Shared datastore. Once this has proven stable then upgrade to VIO3.0.

I can't get past the first step though from the errors above.

Thanks

Jeff

yjia · ‎12-22-2016

VIO will talk to cluster instead of esxi. so you can add any host you want and VIO will still works. But please keep in mind that VIO will only use the datastore specified in VIO setup.

Currently you deployment is in a wrong statement, we need you help to upload your logs so that we can identify the root cause and help you solve the issue together.

please use the command in comment 8 Re: Error : No valid host was found. There are not enough hosts available. Code 500

to collect the logs.

jgover · ‎12-23-2016

Hi,

Where do I upload the logs? It is too big too attach.

Thanks

Jeff

jgover · ‎12-30-2016

Hi,

Does this error help from /var/log/jarvis/ansible.log ?

Thanks

Jeff

2016-10-19 14:52:20,087 p=409 u=jarvis | GATHERING FACTS ***************************************************************

2016-10-19 14:52:38,169 p=409 u=jarvis | fatal: [10.10.10.77] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue

2016-10-19 14:52:38,170 p=409 u=jarvis | TASK: [resume-lb | Gracefully reload HAproxies config to resume service] ******

2016-10-19 14:52:38,176 p=409 u=jarvis | FATAL: no hosts matched or all hosts have already failed -- aborting

2016-10-19 15:13:47,071 p=409 u=jarvis | PLAY [lb] *********************************************************************

2016-10-19 15:13:47,072 p=409 u=jarvis | GATHERING FACTS ***************************************************************

2016-10-19 15:13:50,085 p=409 u=jarvis | fatal: [10.10.10.77] => SSH encountered an unknown error during the connection. We recommend you re-run the command using -vvvv, which will enable SSH debugging output to help diagnose the issue

2016-10-19 15:13:50,086 p=409 u=jarvis | TASK: [pause-lb | Gracefully reload HAproxies config to halt service] *********

2016-10-19 15:13:50,092 p=409 u=jarvis | FATAL: no hosts matched or all hosts have already failed -- aborting

rpellet · ‎01-03-2017

The message indicates that we are unable to open an ssh session into that VM for some reason. What is 10.10.10.77? Can you open a console to the VM with that address? Can you login to that VM if you get a console? The message indicates that we are unable to open an ssh session into that VM for some reason.

jgover · ‎01-03-2017

Hi,

How do I get the 179MB log file up to you folks?

Thanks

Jeff

jgover · ‎01-09-2017

Hi,

Any ideas how I would get the logs to you? -see previous request

Thanks

yjia · ‎01-09-2017

Hi , Sorry for the later response.

If you have a VMware contact Person, then He can help you to upload large files through service request.

If not. please untar the 20xxxx.tgz the file first and then upload the support_vio_mgmt.tgz file.

jgover · ‎03-16-2017

Sorry and I too was side-tracked on other projects.

This whole install I inherited is just bad.

What if I reinstalled VIO with shared storage available (that is the whole problem here)? I don't even know how it got to this stage without shared storage.

Is the re-install intelligent enough to keep network and all configurations etc. and move to it the shared storage?

Thanks

Jeff

All

Error : No valid host was found. There are not enough hosts available. Code 500