Maximum memory congestion reached

shashi_vmware · ‎08-09-2017

Hi,

I have 6.6 4-node VSAN cluster. I am not able to create new VMs or power on the existing VM. The error is "maximum memory congestion reached, failed to crate object". Can someone please help me how to solve this issue?

On the VSAN observer, for the node1, the SSD write buffer is showing as 96%, with LLOG data space as 95, memory showing as congestion.

# esxcli vsan health cluster list | grep red

Overall health red (Physical disk issue)

Physical disk red

Overall disks health red

Congestion red

Memory pools (heaps) red

Data red

vSAN object health red

Thank you in advance.

TheBobkin · ‎08-09-2017

Hello,

Welcome to vSAN Communities, I hope you find some good info on this forum.

There are different types of congestions, what you described here could be SSD or Memory congestion:

kb.vmware.com/kb/2071384

Can you screenshot or copy+paste some logging from the host via SSH or DCUI and run:

# tail -f /var/log/vmkernel.log

or

# dmesg

Does this host have multiple disk groups and what is the current output of:

# localcli vsan storage list

If it only has one disk-group (the one that is currently impacted) then you should check current Object accessibility states from Health check and determine what the impact of isolating and/or rebooting this will be (if everything is accessible/rebuilt without this host it *may* be safe to reboot it, however this is not guaranteed to fix the problem if there is a physical/firmware fault).

If you have a support agreement with VMware I advise opening an SR so the impact and type of congestion can be addressed.

Bob

shashi_vmware · ‎08-09-2017

Hi Bob,

I am desperately looking for help in this forum as i do not have support contract with VMware.

I have attached dmesg and vmkernel.log files. The cluster is having only one disk group.

TheBobkin · ‎08-10-2017

Hello,

That is memory congestion alright.

I don't think it will be possible to get this host manageable properly without rebooting it (and this won't fix it if this is caused by a hardware failure).

Check the current Data health from the Web Client Health-check, if all data got rebuilt on the remaining nodes then rebooting this host shouldn't have any data accessibility impact.

It is possible to *temporarily* test the impact of this hosts data being inaccessible from the cluster (as it would during a reboot) by untagging vsan traffic on the vmk interface on this host that is configured for vSAN. (enable it again if things go inaccessible and plan down-time for these Inaccessible VMs accordingly).

Are you able to vMotion VMs off this host or is it non-responsive?

If VMs are still on this host, are they still up and functional?

(If VMs still up and no vMotion then shut down the Guest OSes prior to rebooting)

Bob

depping · ‎08-11-2017

Is this a physical environment or nested lab? (assuming it is a lab considering you don't have support)

All

Maximum memory congestion reached