VMware Cloud Community
AlexandreMARTIN
Contributor
Contributor

VIO 6 DHCP Agents Bug? with solution

Hello,

Today i found one more bug in this product, let me explain :

Dhcp agents where giving an IP address to my instances but the instance never taked this IP, with an dhclient command never found DHCPOFFERS from the dhcp agents

In first list the pods name of your penstack deployment :

kubectl get pods --all-namespaces | grep dhcp

So i looked for the dhcp agents logs with this command on the vio controller :

kubectl logs -f --namespace=openstack neutron-dhcp-agent-default-***

NB: *** --> this is "unique"

This message was spamming :

2019-12-03 17:14:56.735 49 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/dhcp_agent.ini', '--config-file', '/var/lib/neutron/dhcp/dhcp_override_mac.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpNauu1q/privsep.sock']

Changing password for root.

2019-12-03 17:14:56.752 49 WARNING oslo.privsep.daemon [-] privsep log: sudo: Account or password is expired, reset your password and try again

2019-12-03 17:14:56.752 49 WARNING oslo.privsep.daemon [-] privsep log: sudo: no tty present and no askpass program specified

2019-12-03 17:14:56.752 49 WARNING oslo.privsep.daemon [-] privsep log: sudo: unable to change expired password: Authentication token manipulation error

2019-12-03 17:14:56.758 49 CRITICAL oslo.privsep.daemon [-] privsep helper command exited non-zero (1)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 576b74f4-f765-4955-a1f9-e7681bbef3a6.: FailedToDropPrivileges: privsep helper command exited non-zero (1)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 157, in call_driver

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 218, in enable

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     common_utils.wait_until_true(self._enable)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 691, in wait_until_true

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     while not predicate():

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 229, in _enable

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     interface_name = self.device_manager.setup(self.network)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1506, in setup

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     ip_lib.IPWrapper().ensure_namespace(network.namespace)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 236, in ensure_namespace

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     if not self.netns.exists(name):

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 797, in exists

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     return network_namespace_exists(name)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1002, in network_namespace_exists

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     output = list_network_namespaces(**kwargs)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 991, in list_network_namespaces

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     return privileged.list_netns(**kwargs)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 240, in _wrap

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     self.start()

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 251, in start

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     channel = daemon.RootwrapClientChannel(context=self)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 328, in __init__

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     raise FailedToDropPrivileges(msg)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent FailedToDropPrivileges: privsep helper command exited non-zero (1)

2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent

As you see in bold the important parts and the more important :

sudo: Account or password is expired, reset your password and try again

let get ride of this problem :

Open a shell on all agents :

kubectl exec -it --namespace=openstack neutron-dhcp-agent-default-*** -- /bin/bash

you will be logged as root, just try a simple sudo command :

sudo ls

this is saying you that your password is too much aged and PAM is broken Smiley Sad

We need now to turn off agging password but retain the current password, i think there is no password, Vmware can confirm ?

But whatever

i found in /etc/passwd, two accounts, one for root (of course!) and one for neutron, let's turn off aging on this two maybe you need it just for root

so now pass this commands :

passwd -x -1 root

passwd -x -1 neutron

Do it on all DHCP agents pods you have

Now DHCP will work

I saw that the conf file for chage : /etc/login.defs

have this value :

PASS_MAX_DAYS    90

Last days we saw people complaining about dhcp, i think this a relase bug about the product, 90 days correspond to the release date of 6.0, out for the public the 3 september but released by the devs maybe 2 or 3 weeks ago this date...

I'm very unsatisfied with this product, this is just full of bugs that i have to correct/bypass before VMware WTF ??

Please just see that bug i have since i started using VIO 6.0, console never worked great...

Alexandre MARTINS on LinkedIn: "WTF #vmware fresh install console fully bugged VMware Integrated Ope...

Regards,

Alexandre MARTINS.

https://www.linkedin.com/in/al3xmartins/
Tags (1)
Reply
0 Kudos
6 Replies
OsburnM
Hot Shot
Hot Shot

So unfortunately, while this does solve my VIO6 DHCP issue (after doing this to all the neutron-dhcp-agent-default-***, neutron-metadata-agent-default-***, & neutron-server-**-**) -- it uncovered an even bigger problem...  all the services are set to 90pw expire.  If you simulate a hard shutdown / lights-out scenario, it is unable to recover.  If you complete your deployment, before it reboots/restarts, you'll need to fix all of these or the next time you're in a major outage, it won't come back.

Entirely unusable product out of the box!

Very sad.

Reply
0 Kudos
Chandler_Zhang
VMware Employee
VMware Employee

Thanks for trying VIO 6! It is a known issue and please try the following workaround. A KB will be soon published about it:

1. For mgmt vm.

   1.1 SSH to the mgmt vm. And run the following command.

       chage -I -1 -m 0 -M 99999 -E -1 root

       chage -I -1 -m 0 -M 99999 -E -1 vioadmin

2. For each controller node vm. As we don't set the password of vioadmin or root user. So we can't login it from the vsphere web ui. So we need to do as the following.

   2.1 Break in as root user for each controller node following the doc.

       https://vmware.github.io/photon/assets/files/html/3.0/photon_troubleshoot/resetting-a-lost-root-pass...

And run the following command.

       chage -I -1 -m 0 -M 99999 -E -1 root

       chage -I -1 -m 0 -M 99999 -E -1 vioadmin

3. Then was able to generate logs and ssh into the controllers.

The issue will also be fixed in next patch release.

Reply
0 Kudos
OsburnM
Hot Shot
Hot Shot

After doing a clean re-install, before attempting a new deployment, I unlocked root & vioadmin on the vio-manager, then modified the /etc/passwd.defs to change pws to 99999.  Then I opened the controller base image using the breakin method to update them there as well, along with its passwd.defs.  Then shut it back down.

Then went into VIO-Manager UI and started my deployment.

It successfully completes the deployment and all services display running.

From there I 'viossh controller-*******' and verify I can run 'sudo su' and 'chage -l root' to verify the root & vioadmin pw's aren't expired any longer.

***HOWEVER***

ALL of the pod container service account pws are expired!  So even with all of the above, if I attempt to simulate a lights-out scenario, it is unable to come back up and I have to rebuild the entire deployment again.

root@vio-manager [ ~ ]# kubectl exec -it --namespace=openstack neutron-dhcp-agent-default-nlp4h -- /bin/bash

[root@controller-v6gkckprzs /]# chage -l root

You are required to change your password immediately (password expired)

chage: PAM: Authentication token is no longer valid; new one required

[root@controller-v6gkckprzs /]# passwd -x -1 root

passwd: password expiry information changed.

[root@controller-v6gkckprzs /]# chage -l neutron

Last password change                                    : Aug 26, 2019

Password expires                                        : Nov 24, 2019

Password inactive                                       : never

Account expires                                         : never

Minimum number of days between password change          : 0

Maximum number of days between password change          : 90

Number of days of warning before password expires       : 7

[root@controller-v6gkckprzs /]# chage -l root

Last password change                                    : Aug 26, 2019

Password expires                                        : never

Password inactive                                       : never

Account expires                                         : never

Minimum number of days between password change          : 0

Maximum number of days between password change          : -1

Number of days of warning before password expires       : 7

[root@controller-v6gkckprzs /]#

Reply
0 Kudos
MentzerJ
Contributor
Contributor

Can you speak to when this patch will be released?

Reply
0 Kudos
Chandler_Zhang
VMware Employee
VMware Employee

Thanks for the comments! We are adding this lights-out scenarios into account for upcoming patch release.

Reply
0 Kudos
rpellet
VMware Employee
VMware Employee

Please see VMware Knowledge Base   .  A new build of VIO 6.0 was released that address' the password expiration issues.

Reply
0 Kudos