I recently performed a round of upgrades on our main ESXi 6 cluster. vCenter is now running 6.5.15259038 and the cluster hosts are running 6.0.15169789. All the patching went ok, and this is the last version of 6.0 I plan to use before moving to vSphere 6.5. We're running Enterprise, with DRS enabled. Servers are all Dell PowerEdge R540's with latest firmware/BIOS etc.
Two weeks after applying the latest ESXi patch, I noticed one of the hosts was showing with an alarm, and the message 'Unable to apply DRS resource settings on host. . This can significantly reduce the effectiveness of DRS.'. I have seen this once before, so planned to restart the host management agents. The console on this host was extremely slow to respond though, and in the end became totally unresponsive, as was SSH. The VM's were still responding, but within the hour they all went unresponsive, and the host became unresponsive in the cluster. I tried a proper shutdown from the console, but that just hung, and I had no choice but to hard boot the server. It came back up ok though, and VM operations resumed.
Now, a week or so later, another host in the cluster is reporting the same error. VM's are still running, and the host is responding, but for how long I don't know. The previous build of ESXi we were running never had any DRS issues at all - it's only this latest patch that's causing it by the looks of it. I'm getting very concerned now that this is going to be a regular problem on this build. I've just disabled DRS in the cluster, which isn't ideal, but has cleared the host error for now. As I said, it only seems to be this latest build that causing the issues, although I did to a minor vCenter patch a couple of weeks before doing the hosts.
Is this a known problem with recent builds? Should I just be looking to upgrade to 6.5 or 6.7 now? 6.0 has certainly done the business for us, and has been very reliable, but I'm now hesitant and a bit concerned.
I had a similar issue with 6.5 in december. Fortunately restarting the managerment agents resolved it for me.... I may be worth pre-emptively doing the restart or scheduling a reboot of the hosts.
6.0 goes end of support in March though so you should look to upgrade if possible
Thanks. Like I said, I have seen this problem before, but not on this cluster, and it's only since I applied the latest patches. To have two hosts within 2-3 weeks both giving the same error was more than a co-incidence!
You're right, I do need to upgrade, and will be soon, but I'll be running 6.0 for another few weeks at least.