ESXi: 6.7.0, 15160138
Vcenter: 6.7.0.40000
Some VM's on my ESXi server were apparently powered down because they're not running anymore. Whenever I try to start them through vCenter it says: "the host does not have sufficient memory resources to satisfy the reservation". When I go to the summary tab on the specific ESXi server in vSphere and check the memory, there's plenty of memory free and the mentioned VM's don't exceed that memory by a mile.
For troubleshooting purposes, I shutdown one VM (let's call this VM A) on that ESXi server which has a lot of memory allocated to it and tried to start the previously mentioned other VM's that were off. That didn't make a difference and I got the same error message. Now when I try to power on VM A again I can't and I get the same error message as with the other VM's while again, there's plenty of memory free on the ESXi server and that VM was running fine before I shut it down.
I searched a bit about what could cause this and it mostly revers to HA, Vmotion, DR etc. I don't have any of that. I can't even find the HA tab section when I navigate to my ESXi server > Configure tab.
My server isn't exposed to the internet but to be sure, the only thing I did last monday or wednesday, is enable these workarounds because of recent security issues
https://kb.vmware.com/s/article/82374
https://kb.vmware.com/s/article/76372
Anyone any idea what's going here?
This is my memory free while trying to start a 8GB RAM VM:
This is the error message:
(removed the VM name from the screenshot)
Same when I try it directly through the ESXi interface (removed the VM name from the screenshot):
Command on ESXi shell:
[root@Censored:~] vsish -e get /memory/comprehensive
Comprehensive {
Physical memory estimate:66930164 KB
Given to VMKernel:66930164 KB
Reliable memory:0 KB
Discarded by VMKernel:1596 KB
Mmap buddy overhead:16652 KB
Kernel code region:20480 KB
Kernel data and heap:14336 KB
Other kernel:1519568 KB
Non-kernel:30204892 KB
Reserved memory at low addresses:330924 KB
Free:35152640 KB
}
>>> I can't even find the HA tab section when I navigate to my ESXi server > Configure tab
HA is configured on the Cluster level.
Issues like what you describe often occur due to memory reservation settings for one (or more) VMs in the cluster. Depending on the HA Admission Control settings, HA calculates slot sizes for CPU, and memory. When a VM is powered on, HA first checks whether there are sufficient slots available to meet the VM's needs.
Example: If just a single VM in the cluster has a 16GB RAM reservation, the memory slot size will be 16GB. So each VM - regardless of its configured memory settings - will be taken into HA calculation with a multiple of this slot size.
What you may do is to check your VMs' settings, i.e. whether VMs with CPU/memory reservations exist. If so, options are to either remove the reservation (unless it's required), or change/modify the HA admission Control settings.
Hint: An easy way to get a comprehensive overview (including VM reservations) is to use RVTools.
André
Hi Andre, thanks for your reply. I don't have any clusters and I've looked everywhere for HA settings but couldn't find them. I'm not even sure if our simple license supports that. I also never configured any HA or cluster settings.
This is how our cluster tab looks like in the data center where the ESXi server resides:
Any other ideas? I'm banging my head against a wall atm 😛
So the host has never been part of a cluster? That's indeed strange.
Can you please run esxcli software vib list |grep fdm from the host's command line, to check whether the HA agent is installed?
André
Correct, never been part of a cluster. This is my output on mentioned command:
[root@Censored:~] esxcli software vib list | grep fdm
[root@Censored:~]
The only other reason for this (other than a bug) that I could think of, are resource pool settings, which may prevent the VM from being powered on.
In case you are using resource pools, please check their settings.
André
I also don't use resource pools. I guess I'm gonna shutdown all VM's and give the host a restart than. Might as well update it too.
Thanks for your help.
Can you run the following on the host and post the output here between pre tags (like I did with the cmd) or e.g. on pastebin?
memstats -r group-stats -g0 -l2 -s gid:name:parGid:nChild:min:max:conResv:availResv:memSize -u mb 2> /dev/null | sed -n '/^-\+/,/.*\n/p'
Thanks for taking the time to respond to my thread vbondzio. I already restarted (and updated) the host yesterday and everything is fine again, so I'm not sure if this info will be of any use currently but here you go:
[root@Censored:~] memstats -r group-stats -g0 -l2 -s gid:name:parGid:nChild:min:max:conResv:availResv:memSize -u mb 2> /dev/null | sed -n '/^-\+/,/.*\n/p'
--------------------------------------------------------------------------------------------------------------
gid name parGid nChild min max conResv availResv memSize
--------------------------------------------------------------------------------------------------------------
0 host -1 4 65037 65037 5759 59278 51961
1 system 0 10 2851 -1 2835 59294 2683
2 vim 0 4 0 -1 2332 59278 475
3 iofilters 0 3 0 -1 25 59278 11
4 user 0 12 0 -1 552 59278 48793
--------------------------------------------------------------------------------------------------------------
If the issue is gone then it isn't of too much use 🙂
On that host at this point in time, further VMs should be able to reserve about 58GB (59278 MB) of memory, not that it isn't recommended to fully reserve all resources from a host but a VM with 59277 MB of reserved memory (incl. overhead, so the VM could have maybe 58000) should be able to power on (assuming there is no HA / admission control).
Right now, it seems no VM has any substantial reservations, the 552 MB sum of reserved memory in the user pool is most likely just overhead memory.
If this happens again, use this for a high level overview. Or just match all groups that reserve memory via:
[root@esx01:~] memstats -r group-stats -s gid:name:min:max:minlimit:shares:conResv:availResv:memSize -u mb | sed -n '/^-\+/,/.*\n/p' | awk 'NR == 3 || $3 !~ /^0/ {pr int $0}' ------------------------------------------------------------------------------------------------------------------ gid name min max minLimit shares conResv availResv memSize ------------------------------------------------------------------------------------------------------------------ 0 host 65117 65117 65117 8589934591 8792 56325 6279 1 system 3579 -1 -1 500 3563 56341 3401 76 BufferCache 1 6736 -1 1000 0 6736 40 88 physMem 1 1 -1 -3 0 1 1 89 heap:VsiHeap+0x0 1 31 -1 -3 0 31 1 91 heap:refCount+0x1 1 15 -1 -3 0 15 3 92 heap:VmkAccessDomainTableHeap+0x2 1 1 -1 -3 0 1 1 94 heap:vmkaccessLog+0x3 1 4 -1 -3 0 4 1
(...)
That's some useful info and some great advice. Thanks, I will keep this in mind.