2 Replies Latest reply on Oct 25, 2017 7:35 AM by ArrowSIVAC

    vCenter 6.5u1 - HA Deployment Fails - Unable to remove NIC

    ArrowSIVAC Enthusiast

      Basic installation of fresh OVA vCenter.

       

      eth0 setup and cluster built. Nodes added. No issue.

       

      Trying to add HA.  Use "basic" to provide simple deployment. 

       

      Production Network:  172.20.13.x/24

       

      HA network: 172.20.17.0/24

       

      We first ran the HA wizard basic and got error about

      vCenterHADeployFail1.png

       

       

      Very sparse detail on what created this.  Made an attempt to run again then got error about IP already in use.  The first round created an eth1 on the vcenter appliance and bound the IP. .. and failed somewhere downstream of the process.

       

      Wanting to re-use the IP ...we tried to remove the NIC.  To do clean remove.  We went into managment UI  //vcenter:5480

       

      Set eth1 to no ipv4 and only ipv6

       

      This fails.  We ended up setting NIC configuration to another unused IP.. and rebooting.

       

      Now the system boots up and does not start eth0  just eth1.

       

      So we tried via

       

       

      /opt/vmware/share/vami/vami_config_net

       

      to do same thing as GUI..  No change.. It would not change to setup no-IPV4 and just IPV6 so we could remove at vm host level (esxi modificaiton of VM to delete the nic).

       

      After several attempts to clear out eth1  we then removed the nic from VM, and also cleared out:  /etc/systemd/network/10-eth1.network.manual

       

      eth1 gone, but eth0 no longer starts automatically.

      Googled all around to find examples or manual for vami_* commands which seem to be the control for host configuration.  I found nothing of value.

       

       

      Questions:

      1) How do you properly remove a nic added via HA wizard when it fails

      2) any details on what this PNID issue is

      3) Any ideas on how to repair eth0 so it starts on boot

       

      Thanks,

        • 1. Re: vCenter 6.5u1 - HA Deployment Fails - Unable to remove NIC
          ArrowSIVAC Enthusiast

          Update:

           

          1) How do you properly remove a nic added via HA wizard when it fails

                    VCSA HA eth0 and services dont start after reboot

                    "destroy-vcha"

          2) Any details on what this PNID issue is

          3) Any ideas on how to repair eth0 so it starts on boot

                    -> I don't have a baseline as I did the "destroy-vcha" command first, but the rename of "/etc/systemd/network/10-eth0.network.manual" to "/etc/systemd/network/10-eth0.network" 

           

           

           

          This may be correct.. .or partial.   input or validation appreciated.   The VM is now booting fine with eth0

          • 2. Re: vCenter 6.5u1 - HA Deployment Fails - Unable to remove NIC
            ArrowSIVAC Enthusiast

            I don't have the current time to validate this but I believe I realize the issue.

             

            The point of "Override Managment Network on Failover" would be a design where you would have failover to another VLAN / network segment (example: Prod 172.20.13.0/24 and DR site 172.20.12.0/24

             

            As such when we ran the wizard we did not think it would be an issue if we input the "override" target IP to be on the SAME network as what is the current vCenter IP.   Example:  current vCenter public IP 172.20.13.99/24  with "override IP" set to 172.20.13.101/24

             

            We did this because that was the old IP we has on the second vCenter server, back in 5.x when HA was not around and it was more of a resource quick recovery methodology.

             

             

            We did NOT check that box (after running noted cleanup of nic it created but did not remove upon script failure) and the HA built out as expected.

             

            If the above assumptions are correct VMWare has the following bugs:

             

            1) Need to run check that if "Overide" is selected, it does not allow VLAN to be same as current running user facing IP .. or does not fail and allows

            2) The wizard needs to back out the nic add and binding commands if the deployment of HA fails

            3) The management portal needs to add in function to unbind IP and remove a secondary nic from appliance.