Hi,
I configured in our test environment a VCSA 6.5 HA deployment. All three node were up and fine, vcenter showed them green at HA status. First problem: After a simultaneous reboot of all three machines, the eth0 interface of master and passive node stays down. Second problem: even after configuring eth0 manually and successfully regaining IP connectivity on public interface, the vcsa services can't be started:
Command> service-control --status
Running:
vmware-statsmonitor vmware-vcha vmware-vmon
Stopped:
applmgmt lwsmd pschealth vmafdd vmcad vmcam vmdird vmdnsd vmonapi vmware-cis-license vmware-cm vmware-content-library vmware-eam vmware-imagebuilder vmware-mbcs vmware-netdumper vmware-perfcharts vmware-psc-client vmware-rbd-watchdog vmware-rhttpproxy vmware-sca vmware-sps vmware-sts-idmd vmware-stsd vmware-updatemgr vmware-vapi-endpoint vmware-vpostgres vmware-vpxd vmware-vpxd-svcs vmware-vsan-health vmware-vsm vsphere-client vsphere-ui
Command> service-control --start --all
Perform start operation. vmon_profile=HACore, svc_names=None, include_coreossvcs=True, include_leafossvcs=False
2017-01-13T13:30:25.695Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'lwsmd']
2017-01-13T13:30:25.698Z Done running command
Service lwsmd startup type is not automatic. Skip
2017-01-13T13:30:25.701Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmafdd']
2017-01-13T13:30:25.703Z Done running command
Service vmafdd startup type is not automatic. Skip
2017-01-13T13:30:25.705Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmdird']
2017-01-13T13:30:25.707Z Done running command
Service vmdird startup type is not automatic. Skip
2017-01-13T13:30:25.710Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmcad']
2017-01-13T13:30:25.712Z Done running command
Service vmcad startup type is not automatic. Skip
2017-01-13T13:30:25.714Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-sts-idmd']
2017-01-13T13:30:25.716Z Done running command
Service vmware-sts-idmd startup type is not automatic. Skip
2017-01-13T13:30:25.719Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-stsd']
2017-01-13T13:30:25.721Z Done running command
Service vmware-stsd startup type is not automatic. Skip
2017-01-13T13:30:25.723Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmdnsd']
2017-01-13T13:30:25.726Z Done running command
Service vmdnsd startup type is not automatic. Skip
2017-01-13T13:30:25.728Z Running command: ['/usr/bin/systemctl', 'is-enabled', u'vmware-psc-client']
2017-01-13T13:30:25.730Z Done running command
Service vmware-psc-client startup type is not automatic. Skip
Successfully started vmon services. Profile HACore.
/etc/systemd/network/10-eth0.network.manual:
[Match]
Name=eth0
[Network]
Gateway=10.45.128.1
Address=10.45.128.32/24
DHCP=no
[DHCP]
UseDNS=false
/etc/systemd/network/10-eth1.network
[Match]
Name=eth1
[Network]
Address=192.168.64.204/23
DHCP=no
[DHCP]
UseDNS=false
Now I shut the passive node down to get active node and witness node up again but the problems still persist. Pinging between active and witness node HA interfaces works. Why are both eth0 down after boot and why cant i start the services?
edit:
networkctl status eth0
● 2: eth0
Link File: /usr/lib/systemd/network/99-default.link
Network File: n/a
Type: ether
State: routable (unmanaged)
Path: pci-0000:03:00.0
Driver: vmxnet3
Vendor: VMware
Model: VMXNET3 Ethernet Controller
HW Address: 00:0c:29:7e:7e:08 (VMware, Inc.)
MTU: 1500
Address: 10.45.128.32
Gateway: 10.45.128.1 (ICANN, IANA Department)
Network file n/a and State: unmanged??? But I have file /etc/systemd/network/10-eth0.network.manual which was created by vcenter. How can I fix this?
Same thing exactly after failed HA cluster installation.
I destroyed failed cluster, deleted passive and witness nodes, restarted active node and lost management IP.
Done some experiments to /etc/systemd/network/10-eth0.network.manual
root@vcsa1 [ /etc/systemd/network ]# mv 10-eth0.network.manual 10-eth0.network
root@vcsa1 [ /etc/systemd/network ]# systemctl restart systemd-networkd
root@vcsa1 [ /etc/systemd/network ]# networkctl
Got this:
IDX LINK TYPE OPERATIONAL SETUP
1 lo loopback carrier unmanaged
2 eth0 ether routable configured
eth0 is up, pings fine.
After reboot - network's down again and 10-eth0.network got renamed back to 10-eth0.network.manual
Fine.
root@vcsa1 [ /etc/systemd/network ]# cp 10-eth0.network.manual 20-eth0.network
root@vcsa1 [ /etc/systemd/network ]# systemctl restart systemd-networkd
Network is up.
Rebooting vcsa, while pinging it from outside. It starts up, i see couple of pings and no pings again.
Checking renamed 20-eth0.network is still there, but something brings network down during boot.
root@vcsa1 [ /etc/systemd/network ]# systemctl restart systemd-networkd
Makes it work again.
Just out of curiosity:
root@vcsa1 [ /etc/systemd/network ]# /etc/rc.d/init.d/network start
Starting network (via systemctl): Job for network.service failed because the control process exited with error code. See "systemctl status network.service" and "journalctl -xe" for details.
[FAILED]
root@vcsa1 [ /etc/systemd/network ]# systemctl status network.service
● network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: enabled)
Active: failed (Result: exit-code) since Wed 2017-02-08 10:28:37 UTC; 10s ago
Docs: man:systemd-sysv-generator(8)
Process: 2345 ExecStart=/etc/rc.d/init.d/network start (code=exited, status=6)
Feb 08 10:28:37 vcsa1.lab.local systemd[1]: Starting LSB: Bring up/down networking...
Feb 08 10:28:37 vcsa1.lab.local systemd[1]: network.service: Control process exited, code=exited status=6
Feb 08 10:28:37 vcsa1.lab.local systemd[1]: Failed to start LSB: Bring up/down networking.
Feb 08 10:28:37 vcsa1.lab.local systemd[1]: network.service: Unit entered failed state.
Feb 08 10:28:37 vcsa1.lab.local systemd[1]: network.service: Failed with result 'exit-code'.
I'm actually willing to fix this, since it's the second time I redeploy vcsa and run into the same problem.
Anyone?
UPDATE.
It seems that the vCenterHA cluster was destroyed incorrectly.
Destroying it manually on the active node did the trick.
root@vcsa1 [ ~ ]# destroy-vcha
Caution: This will remove all vCenter HA related configuration from the current node and it cannot be reused to form a vCenter HA cluster unless this is the Active node.
Confirm to proceed? (y/n): y
logs available at: /var/log/vmware/vcha
2017-02-08T13:31:44.935Z Successfully updated starttype: DISABLED for service vcha
2017-02-08T13:31:50.644Z Running command: ['/usr/lib/applmgmt/networking/bin/firewall-reload']
2017-02-08T13:31:50.778Z Done running command
Skip not found service - vmware-stsd
Skip not found service - vmware-sts-idmd
Skip not found service - vmdnsd
Skip not found service - vmdird
Skip not found service - vmcad
Skip not found service - vmware-psc-client
Reboot and you're done.
+1 for
# destroy-vcha
THANK YOU!!!!! I thought I'd be proactive and deploy VCSA HA. What a nightmare!. VMware needs to pull this feature until it's fixed. Lost eth0 like you guys and spent the last couple hours troubleshooting. Was about to rebuild VCSA when i came across this post. Saved the day
Work for me too!
For VCSA 8.0:
vcha-destroy -f
reference: