Hi,
I have 6.6 4-node VSAN cluster. I am not able to create new VMs or power on the existing VM. The error is "maximum memory congestion reached, failed to crate object". Can someone please help me how to solve this issue?
On the VSAN observer, for the node1, the SSD write buffer is showing as 96%, with LLOG data space as 95, memory showing as congestion.
# esxcli vsan health cluster list | grep red
Overall health red (Physical disk issue)
Physical disk red
Overall disks health red
Congestion red
Memory pools (heaps) red
Data red
vSAN object health red
Thank you in advance.
Hello,
Welcome to vSAN Communities, I hope you find some good info on this forum.
There are different types of congestions, what you described here could be SSD or Memory congestion:
kb.vmware.com/kb/2071384
Can you screenshot or copy+paste some logging from the host via SSH or DCUI and run:
# tail -f /var/log/vmkernel.log
or
# dmesg
Does this host have multiple disk groups and what is the current output of:
# localcli vsan storage list
If it only has one disk-group (the one that is currently impacted) then you should check current Object accessibility states from Health check and determine what the impact of isolating and/or rebooting this will be (if everything is accessible/rebuilt without this host it *may* be safe to reboot it, however this is not guaranteed to fix the problem if there is a physical/firmware fault).
If you have a support agreement with VMware I advise opening an SR so the impact and type of congestion can be addressed.
Bob
Hello,
That is memory congestion alright.
I don't think it will be possible to get this host manageable properly without rebooting it (and this won't fix it if this is caused by a hardware failure).
Check the current Data health from the Web Client Health-check, if all data got rebuilt on the remaining nodes then rebooting this host shouldn't have any data accessibility impact.
It is possible to *temporarily* test the impact of this hosts data being inaccessible from the cluster (as it would during a reboot) by untagging vsan traffic on the vmk interface on this host that is configured for vSAN. (enable it again if things go inaccessible and plan down-time for these Inaccessible VMs accordingly).
Are you able to vMotion VMs off this host or is it non-responsive?
If VMs are still on this host, are they still up and functional?
(If VMs still up and no vMotion then shut down the Guest OSes prior to rebooting)
Bob
Is this a physical environment or nested lab? (assuming it is a lab considering you don't have support)