VMware Cloud Community
ArrowSIVAC
Enthusiast
Enthusiast

vCenter 6.5u1 - HA Deployment Fails - Unable to remove NIC

Basic installation of fresh OVA vCenter.

eth0 setup and cluster built. Nodes added. No issue.

Trying to add HA.  Use "basic" to provide simple deployment. 

Production Network:  172.20.13.x/24

HA network: 172.20.17.0/24

We first ran the HA wizard basic and got error about

vCenterHADeployFail1.png

Very sparse detail on what created this.  Made an attempt to run again then got error about IP already in use.  The first round created an eth1 on the vcenter appliance and bound the IP. .. and failed somewhere downstream of the process.

Wanting to re-use the IP ...we tried to remove the NIC.  To do clean remove.  We went into managment UI  //vcenter:5480

Set eth1 to no ipv4 and only ipv6

This fails.  We ended up setting NIC configuration to another unused IP.. and rebooting.

Now the system boots up and does not start eth0  just eth1.

So we tried via

/opt/vmware/share/vami/vami_config_net

to do same thing as GUI..  No change.. It would not change to setup no-IPV4 and just IPV6 so we could remove at vm host level (esxi modificaiton of VM to delete the nic).

After several attempts to clear out eth1  we then removed the nic from VM, and also cleared out:  /etc/systemd/network/10-eth1.network.manual

eth1 gone, but eth0 no longer starts automatically.

pastedImage_3.png

Googled all around to find examples or manual for vami_* commands which seem to be the control for host configuration.  I found nothing of value.

Questions:

1) How do you properly remove a nic added via HA wizard when it fails

2) any details on what this PNID issue is

3) Any ideas on how to repair eth0 so it starts on boot

Thanks,

Reply
0 Kudos
2 Replies
ArrowSIVAC
Enthusiast
Enthusiast

Update:

1) How do you properly remove a nic added via HA wizard when it fails

          VCSA HA eth0 and services dont start after reboot

          "destroy-vcha"

2) Any details on what this PNID issue is

3) Any ideas on how to repair eth0 so it starts on boot

          -> I don't have a baseline as I did the "destroy-vcha" command first, but the rename of "/etc/systemd/network/10-eth0.network.manual" to "/etc/systemd/network/10-eth0.network" 

This may be correct.. .or partial.   input or validation appreciated.   The VM is now booting fine with eth0

Reply
0 Kudos
ArrowSIVAC
Enthusiast
Enthusiast

I don't have the current time to validate this but I believe I realize the issue.

The point of "Override Managment Network on Failover" would be a design where you would have failover to another VLAN / network segment (example: Prod 172.20.13.0/24 and DR site 172.20.12.0/24

As such when we ran the wizard we did not think it would be an issue if we input the "override" target IP to be on the SAME network as what is the current vCenter IP.   Example:  current vCenter public IP 172.20.13.99/24  with "override IP" set to 172.20.13.101/24

We did this because that was the old IP we has on the second vCenter server, back in 5.x when HA was not around and it was more of a resource quick recovery methodology.

pastedImage_0.png

We did NOT check that box (after running noted cleanup of nic it created but did not remove upon script failure) and the HA built out as expected.

If the above assumptions are correct VMWare has the following bugs:

1) Need to run check that if "Overide" is selected, it does not allow VLAN to be same as current running user facing IP .. or does not fail and allows

2) The wizard needs to back out the nic add and binding commands if the deployment of HA fails

3) The management portal needs to add in function to unbind IP and remove a secondary nic from appliance.

Reply
0 Kudos