When integrating some new ESXi to a cluster last week, I ran into a dumb problem.
Theses ESXi were intended to replace all others in this cluster, so as usual: I integrate them, check and correct compliance with host profile, reboot, get out of maintenance and start to vmotion VMs on them.
Few minutes after, I get some (lot :-)) call of users saying that their VMs are unreachable.
At this point, I go to Configuration / Network in my new Esxi and ...
As you see it, the two physical adapter supporting the VM vswitch are down (this is not the purpose of this topic, but this was related to UCS networking: when you put VLANs that are on disparate L2 network, UCS put down the related vnic).
Currently, I'm trying to found a robust (meaning no human action like for example migrating a test VM and try to pinging it ) solution to prevent that (a least for VMnic status !!)
What we seek ?
A mecanism working in every case (that's important because alarm sometime doesn't trigger) for alerting about vmnic down. A sort of big warning that prompt us (in summary tab for example, like HA warnings).
The best will be if an entire vSwitch is down, putting in production this host will not be possible.
What we already tried ?
- Default Network alarms: It works when ESXi is already connected to vCenter and you have a vmnic state change, but not when the hypervisor is newly integrated to vcenter with physical adapters already down. Result of my test in this case was no alarm are triggered (because I think the alarm is event based, and the event occured before vCenter integration). Same thing when I apply the Host profile and reboot it.
- Configuring an alarm based on esx.problem.net.vmnic.linkstate.down and esx.clear.net.vmnic.linkstate.up events but same result as before: alarm not triggered when integrating ESXi.