6 Replies Latest reply on Jan 8, 2020 8:18 AM by rpellet

    VIO 6 DHCP Agents Bug? with solution

    AlexandreMARTINS Lurker

      Hello,

       

      Today i found one more bug in this product, let me explain :

       

      Dhcp agents where giving an IP address to my instances but the instance never taked this IP, with an dhclient command never found DHCPOFFERS from the dhcp agents

       

      In first list the pods name of your penstack deployment :

       

      kubectl get pods --all-namespaces | grep dhcp

       

      So i looked for the dhcp agents logs with this command on the vio controller :

       

      kubectl logs -f --namespace=openstack neutron-dhcp-agent-default-***

       

      NB: *** --> this is "unique"

       

      This message was spamming :

       

      2019-12-03 17:14:56.735 49 INFO oslo.privsep.daemon [-] Running privsep helper: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--config-file', '/etc/neutron/neutron.conf', '--config-file', '/etc/neutron/dhcp_agent.ini', '--config-file', '/var/lib/neutron/dhcp/dhcp_override_mac.ini', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpNauu1q/privsep.sock']

      Changing password for root.

      2019-12-03 17:14:56.752 49 WARNING oslo.privsep.daemon [-] privsep log: sudo: Account or password is expired, reset your password and try again

      2019-12-03 17:14:56.752 49 WARNING oslo.privsep.daemon [-] privsep log: sudo: no tty present and no askpass program specified

      2019-12-03 17:14:56.752 49 WARNING oslo.privsep.daemon [-] privsep log: sudo: unable to change expired password: Authentication token manipulation error

      2019-12-03 17:14:56.758 49 CRITICAL oslo.privsep.daemon [-] privsep helper command exited non-zero (1)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent [-] Unable to enable dhcp for 576b74f4-f765-4955-a1f9-e7681bbef3a6.: FailedToDropPrivileges: privsep helper command exited non-zero (1)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent Traceback (most recent call last):

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/dhcp/agent.py", line 157, in call_driver

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     getattr(driver, action)(**action_kwargs)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 218, in enable

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     common_utils.wait_until_true(self._enable)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 691, in wait_until_true

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     while not predicate():

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 229, in _enable

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     interface_name = self.device_manager.setup(self.network)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/dhcp.py", line 1506, in setup

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     ip_lib.IPWrapper().ensure_namespace(network.namespace)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 236, in ensure_namespace

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     if not self.netns.exists(name):

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 797, in exists

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     return network_namespace_exists(name)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 1002, in network_namespace_exists

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     output = list_network_namespaces(**kwargs)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 991, in list_network_namespaces

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     return privileged.list_netns(**kwargs)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 240, in _wrap

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     self.start()

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/priv_context.py", line 251, in start

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     channel = daemon.RootwrapClientChannel(context=self)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent   File "/usr/lib/python2.7/site-packages/oslo_privsep/daemon.py", line 328, in __init__

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent     raise FailedToDropPrivileges(msg)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent FailedToDropPrivileges: privsep helper command exited non-zero (1)

      2019-12-03 17:14:56.759 49 ERROR neutron.agent.dhcp.agent

       

       

      As you see in bold the important parts and the more important :

       

      sudo: Account or password is expired, reset your password and try again

       

       

       

      let get ride of this problem :

       

      Open a shell on all agents :

       

      kubectl exec -it --namespace=openstack neutron-dhcp-agent-default-*** -- /bin/bash

       

      you will be logged as root, just try a simple sudo command :

       

      sudo ls

       

      this is saying you that your password is too much aged and PAM is broken

       

      We need now to turn off agging password but retain the current password, i think there is no password, Vmware can confirm ?

      But whatever

       

      i found in /etc/passwd, two accounts, one for root (of course!) and one for neutron, let's turn off aging on this two maybe you need it just for root

      so now pass this commands :

       

      passwd -x -1 root

      passwd -x -1 neutron

       

      Do it on all DHCP agents pods you have

       

      Now DHCP will work

       

      I saw that the conf file for chage : /etc/login.defs

       

      have this value :

       

      PASS_MAX_DAYS    90

       

      Last days we saw people complaining about dhcp, i think this a relase bug about the product, 90 days correspond to the release date of 6.0, out for the public the 3 september but released by the devs maybe 2 or 3 weeks ago this date...

       

      I'm very unsatisfied with this product, this is just full of bugs that i have to correct/bypass before VMware WTF ??

       

      Please just see that bug i have since i started using VIO 6.0, console never worked great...

      Alexandre MARTINS on LinkedIn: "WTF #vmware fresh install console fully bugged VMware Integrated OpenStack 6.0 Anyone fa…

       

      Regards,

      Alexandre MARTINS.

        • 1. Re: VIO 6 DHCP Agents Bug? with solution
          OsburnM Enthusiast

          So unfortunately, while this does solve my VIO6 DHCP issue (after doing this to all the neutron-dhcp-agent-default-***, neutron-metadata-agent-default-***, & neutron-server-**-**) -- it uncovered an even bigger problem...  all the services are set to 90pw expire.  If you simulate a hard shutdown / lights-out scenario, it is unable to recover.  If you complete your deployment, before it reboots/restarts, you'll need to fix all of these or the next time you're in a major outage, it won't come back.

           

          Entirely unusable product out of the box!

           

          Very sad.

          • 2. Re: VIO 6 DHCP Agents Bug? with solution
            Chandler_Zhang Lurker
            VMware Employees

            Thanks for trying VIO 6! It is a known issue and please try the following workaround. A KB will be soon published about it:

             

            1. For mgmt vm.

               1.1 SSH to the mgmt vm. And run the following command.

                   chage -I -1 -m 0 -M 99999 -E -1 root

                   chage -I -1 -m 0 -M 99999 -E -1 vioadmin

            2. For each controller node vm. As we don't set the password of vioadmin or root user. So we can't login it from the vsphere web ui. So we need to do as the following.

               2.1 Break in as root user for each controller node following the doc.

                   https://vmware.github.io/photon/assets/files/html/3.0/photon_troubleshoot/resetting-a-lost-root-password.html

            And run the following command.

                   chage -I -1 -m 0 -M 99999 -E -1 root

                   chage -I -1 -m 0 -M 99999 -E -1 vioadmin

            3. Then was able to generate logs and ssh into the controllers.

             

            The issue will also be fixed in next patch release.

            • 3. Re: VIO 6 DHCP Agents Bug? with solution
              OsburnM Enthusiast

               

               

              After doing a clean re-install, before attempting a new deployment, I unlocked root & vioadmin on the vio-manager, then modified the /etc/passwd.defs to change pws to 99999.  Then I opened the controller base image using the breakin method to update them there as well, along with its passwd.defs.  Then shut it back down.

               

              Then went into VIO-Manager UI and started my deployment.

               

              It successfully completes the deployment and all services display running.

               

              From there I 'viossh controller-*******' and verify I can run 'sudo su' and 'chage -l root' to verify the root & vioadmin pw's aren't expired any longer.

               

              ***HOWEVER***

               

              ALL of the pod container service account pws are expired!  So even with all of the above, if I attempt to simulate a lights-out scenario, it is unable to come back up and I have to rebuild the entire deployment again.

               

              root@vio-manager [ ~ ]# kubectl exec -it --namespace=openstack neutron-dhcp-agent-default-nlp4h -- /bin/bash

              [root@controller-v6gkckprzs /]# chage -l root

              You are required to change your password immediately (password expired)

              chage: PAM: Authentication token is no longer valid; new one required

              [root@controller-v6gkckprzs /]# passwd -x -1 root

              passwd: password expiry information changed.

              [root@controller-v6gkckprzs /]# chage -l neutron

              Last password change                                    : Aug 26, 2019

              Password expires                                        : Nov 24, 2019

              Password inactive                                       : never

              Account expires                                         : never

              Minimum number of days between password change          : 0

              Maximum number of days between password change          : 90

              Number of days of warning before password expires       : 7

              [root@controller-v6gkckprzs /]# chage -l root

              Last password change                                    : Aug 26, 2019

              Password expires                                        : never

              Password inactive                                       : never

              Account expires                                         : never

              Minimum number of days between password change          : 0

              Maximum number of days between password change          : -1

              Number of days of warning before password expires       : 7

              [root@controller-v6gkckprzs /]#

              • 4. Re: VIO 6 DHCP Agents Bug? with solution
                MentzerJ Lurker

                Can you speak to when this patch will be released?

                • 5. Re: VIO 6 DHCP Agents Bug? with solution
                  Chandler_Zhang Lurker
                  VMware Employees

                  Thanks for the comments! We are adding this lights-out scenarios into account for upcoming patch release.

                  • 6. Re: VIO 6 DHCP Agents Bug? with solution
                    rpellet Enthusiast
                    VMware Employees

                    Please see VMware Knowledge Base   .  A new build of VIO 6.0 was released that address' the password expiration issues.