3 Replies Latest reply on Nov 7, 2017 6:43 PM by iforbes

    VCSA HA eth0 and services dont start after reboot

    robert_23 Lurker

      Hi,

      I configured in our test environment a VCSA 6.5 HA deployment. All three node were up and fine, vcenter showed them green at HA status. First problem: After a simultaneous reboot of all three machines, the eth0 interface of master and passive node stays down. Second problem: even after configuring eth0 manually and successfully regaining IP connectivity on public interface, the vcsa services can't be started:

       

      Command> service-control --status

      Running:

      vmware-statsmonitor vmware-vcha vmware-vmon

      Stopped:

      applmgmt lwsmd pschealth vmafdd vmcad vmcam vmdird vmdnsd vmonapi vmware-cis-license vmware-cm vmware-content-library vmware-eam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-perfcharts vmware-psc-client vmware-rbd-watchdog vmware-rhttpproxy vmware-sca vmware-sps vmware-sts-idmd vmware-stsd vmware-updatemgr vmware-vapi-endpoint vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui

       

      Command> service-control --start --all

      Perform start operation. vmon_profile=HACore, svc_names=None, include_coreossvcs=True, include_leafossvcs=False

      2017-01-13T13:30:25.695Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'lwsmd']

      2017-01-13T13:30:25.698Z   Done running command

      Service lwsmd startup type is not automatic. Skip

      2017-01-13T13:30:25.701Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmafdd']

      2017-01-13T13:30:25.703Z   Done running command

      Service vmafdd startup type is not automatic. Skip

      2017-01-13T13:30:25.705Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmdird']

      2017-01-13T13:30:25.707Z   Done running command

      Service vmdird startup type is not automatic. Skip

      2017-01-13T13:30:25.710Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmcad']

      2017-01-13T13:30:25.712Z   Done running command

      Service vmcad startup type is not automatic. Skip

      2017-01-13T13:30:25.714Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-sts-idmd']

      2017-01-13T13:30:25.716Z   Done running command

      Service vmware-sts-idmd startup type is not automatic. Skip

      2017-01-13T13:30:25.719Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-stsd']

      2017-01-13T13:30:25.721Z   Done running command

      Service vmware-stsd startup type is not automatic. Skip

      2017-01-13T13:30:25.723Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmdnsd']

      2017-01-13T13:30:25.726Z   Done running command

      Service vmdnsd startup type is not automatic. Skip

      2017-01-13T13:30:25.728Z   Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-psc-client']

      2017-01-13T13:30:25.730Z   Done running command

      Service vmware-psc-client startup type is not automatic. Skip

      Successfully started vmon services. Profile HACore.

       

      /etc/systemd/network/10-eth0.network.manual:

      [Match]

      Name=eth0

      [Network]

      Gateway=10.45.128.1

      Address=10.45.128.32/24

      DHCP=no

      [DHCP]

      UseDNS=false

       

      /etc/systemd/network/10-eth1.network

      [Match]

      Name=eth1

      [Network]

      Address=192.168.64.204/23

      DHCP=no

      [DHCP]

      UseDNS=false

       

      Now I shut the passive node down to get active node and witness node up again but the problems still persist. Pinging between active and witness node HA interfaces works. Why are both eth0 down after boot and why cant i start the services?

       

      edit:

      networkctl status eth0

      ● 2: eth0

             Link File: /usr/lib/systemd/network/99-default.link

          Network File: n/a

                  Type: ether

                 State: routable (unmanaged)

                  Path: pci-0000:03:00.0

                Driver: vmxnet3

                Vendor: VMware

                 Model: VMXNET3 Ethernet Controller

            HW Address: 00:0c:29:7e:7e:08 (VMware, Inc.)

                   MTU: 1500

               Address: 10.45.128.32

               Gateway: 10.45.128.1 (ICANN, IANA Department)

       

      Network file n/a and State: unmanged??? But I have file /etc/systemd/network/10-eth0.network.manual which was created by vcenter. How can I fix this?

        • 1. Re: VCSA HA eth0 and services dont start after reboot
          sscoodd Novice

          Same thing exactly after failed HA cluster installation.

           

          I destroyed failed cluster, deleted passive and witness nodes, restarted active node and lost management IP.

           

          Done some experiments to /etc/systemd/network/10-eth0.network.manual

           

          root@vcsa1 [ /etc/systemd/network ]# mv 10-eth0.network.manual 10-eth0.network
          root@vcsa1 [ /etc/systemd/network ]# systemctl restart systemd-networkd
          root@vcsa1 [ /etc/systemd/network ]# networkctl
          
          
          
          
          
          
          
          
          
          
          
          
          
          

          Got this:

          IDX LINK            TYPE              OPERATIONAL SETUP

            1 lo              loopback          carrier    unmanaged

            2 eth0            ether              routable    configured


          eth0 is up, pings fine.

           

          After reboot - network's down again and 10-eth0.network got renamed back to 10-eth0.network.manual

          Fine.

          root@vcsa1 [ /etc/systemd/network ]# cp 10-eth0.network.manual 20-eth0.network
          root@vcsa1 [ /etc/systemd/network ]# systemctl restart systemd-networkd
          
          
          
          
          
          
          
          
          
          
          
          
          
          


          Network is up.

           

          Rebooting vcsa, while pinging it from outside. It starts up, i see couple of pings and no pings again.

          Checking renamed 20-eth0.network is still there, but something brings network down during boot.

           

          root@vcsa1 [ /etc/systemd/network ]# systemctl restart systemd-networkd

           

          Makes it work again.

           

          Just out of curiosity:

           

          root@vcsa1 [ /etc/systemd/network ]# /etc/rc.d/init.d/network start

           

          Starting network (via systemctl):  Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.

                                                                    [FAILED]


          root@vcsa1 [ /etc/systemd/network ]# systemctl status network.service


          ● network.service - LSB: Bring up/down networking

            Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: enabled)

            Active: failed (Result: exit-code) since Wed 2017-02-08 10:28:37 UTC; 10s ago

              Docs: man:systemd-sysv-generator(8)

            Process: 2345 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=6)

          Feb 08 10:28:37 vcsa1.lab.local systemd[1]: Starting LSB: Bring up/down networking...

          Feb 08 10:28:37 vcsa1.lab.local systemd[1]: network.service: Control process exited, code=exited status=6

          Feb 08 10:28:37 vcsa1.lab.local systemd[1]: Failed to start LSB: Bring up/down networking.

          Feb 08 10:28:37 vcsa1.lab.local systemd[1]: network.service: Unit entered failed state.

          Feb 08 10:28:37 vcsa1.lab.local systemd[1]: network.service: Failed with result 'exit-code'.

           

          I'm actually willing to fix this, since it's the second time I redeploy vcsa and run into the same problem.

          Anyone?

           

          UPDATE.

           

          It seems that the vCenterHA cluster was destroyed incorrectly.

          Destroying it manually on the active node did the trick.

           

          root@vcsa1 [ ~ ]# destroy-vcha
          Caution: This will remove all vCenter HA related configuration from the current node and it cannot be reused to form a vCenter HA cluster unless this is the Active node.
          Confirm to proceed? (y/n): y
          logs available at: /var/log/vmware/vcha
          2017-02-08T13:31:44.935Z   Successfully updated starttype: DISABLED for service vcha
          2017-02-08T13:31:50.644Z   Running command: ['/usr/lib/applmgmt/networking/bin/firewall-reload']
          2017-02-08T13:31:50.778Z   Done running command
          Skip not found service - vmware-stsd
          Skip not found service - vmware-sts-idmd
          Skip not found service - vmdnsd
          Skip not found service - vmdird
          Skip not found service - vmcad
          Skip not found service - vmware-psc-client
          

           

          Reboot and you're done.

          2 people found this helpful
          • 2. Re: VCSA HA eth0 and services dont start after reboot
            billyjoebob12 Lurker

            +1 for

             

            # destroy-vcha  
            
            • 3. Re: VCSA HA eth0 and services dont start after reboot
              iforbes Hot Shot

              THANK YOU!!!!! I thought I'd be proactive and deploy VCSA HA. What a nightmare!. VMware needs to pull this feature until it's fixed. Lost eth0 like you guys and spent the last couple hours troubleshooting. Was about to rebuild VCSA when i came across this post. Saved the day