Is anyone else getting "503 Service Unavailable (Failed to connect to endpoint: [xxxxxxxxxxxxx] _serverNamespace = / action = Allow _port = 8309)" errors on their ESXi hosts after upgrading to vSphere 7.0 U3 or U3a?
I've got multiple ESXi hosts all having the same problem, and even my newly built vSAN witness appliance is having the same problem after upgrading to 7.0 U3/U3a (It was deployed as 7.0 U2).
The hosts appear as disconnected in vCenter and the web client on each host displays the error above during the time the servers have this issue.
There appears to be no impact to VM networks or other functionality during this time.
Every 12-48 hours all of my hosts experience this problem, and the only way to fix them is to reboot the hosts and they will work normally for xx period of time before going back to this state.
There are not other changes to my environment for the last few months besides going to U3/U3a, so I'm suspecting this is causing the issue.
Anyone else in the same situation?
Over the community I saw people having some bugs after upgrading to 7.0 U3 which could be your case or not. However most of them were solved updating to U3a.
Correct, my hardware is Cisco B200M4 and C220M5, and all components are on the HCL 🙂
vCenter is on 7.0 U3a, the C220M5 and vSan witness appliance is also on 7.0 U3a.
The B200M4 servers are still on 7.0 U3.
Next time one crashes I'll gather those logs and have a look.
I've read where NTP is an issue, it could be time drift causing you issues. I'm a little leery of upgrading my UCS Blades to U3 at the moment. I ugraded vcenter , but not my ESXi hosts, they are still U2.
I tried many solutions people have posted here and it seems a combination of them worked for me.
I restarted mgmt agents, ran "services.sh restart" a few times, checked the date on the box.
I changed the hostname of the machine (simply added a -). Restarted mgmt, that alone didnt seem to do anything. After that I re-ran "services.sh restart" again and the machine instantly was back talking in vCenter. So for me it was a combo of hostname change (i changed it back btw) and then running services.sh restart again.
Talking about NTP, may help to have a look to this: The hostd service in ESXi 7.0U3 crashes due to memory corruption (86283) (vmware.com)