VMware Cloud Community
CristianSalguei
Contributor
Contributor

VCF update precheck fails due to cluster load

Hi!

I am trying to upgrade a VI Domain in SDDC Manager 3.10.2. In this WLD Domain, there's a non-productive cluster with high CPU and Memory usage, which triggers an ERROR in the lcm.log when precheck is running due hosts can't enter in maintenance mode because VMs cannot be moved using vMotion (this is not true, even with resource contention VMS can be moved).

In /var/log/vmware/vcf/lcm/lcm.log appears:

2021-09-08T15:50:04.417+0000 ERROR [0000000000000000,0000,precheckId=503be13f-57ca-485c-b6a2-788c35bf1ca6,resourceType=ESX,resourceId=9a59d50e-3e91-43c3-8b7c-3e312664bf6a] [c.v.e.s.l.c.v.vsphere.VsphereUtils,pool-2-thread-229] Error during enter MAINTENANCE check due to InsufficientResourcesFault { "_msg": "Insufficient resources.", "_faultMsg": [ { "key": "com.vmware.cdrs.maintenancemode.clusterLoadViolated", "arg": [ { "key": "threshold", "value": 80 }, { "key": "clustload", "value": 90 }, { "key": "resource", "value": "cpu" } ], "message": "Host cannot enter maintenance mode since the resulting cluster cpu load (90%) exceeds the tolerence threshold (80%)." } ], "stackTrace": [], "suppressedExceptions": [] } 

 

The key seems to be: "key": "threshold", "value": 80. If the load in the host is over 80 (in my case 90) the precheck fails.

The only parameter related to this that I could find in config files is:

DrsDemandCapacityRatioForRemediation in the VCSA appliance, within the file: /usr/lib/vmware-updatemgr/bin/vci-integrity.xml, but I couldn't find anything in SDDC Manager.

Setting DrsDemandCapacityRatioForRemediation from 150 to 200 and next to 400 the error persists.

Vmware support says that in VCF 4.x there is a flag to avoid this verification, but if someone knows the setting I need to modify to avoid it, will be very thankful. 

 

CristianSalguei_0-1631117205289.png

 

Maybe another solution is to proceed to upgrade regardless of the error and moving the VMS manually, but we didn`t try yet, will be the last option.

 

Thanks!

Labels (3)
Reply
0 Kudos
4 Replies
madey83
Contributor
Contributor

hi

we had the same issue with 3.10.1 VCF version and VMware stated that there is hardcoded limitation of consumed/assigned resources: 80%. 
Solution for that at least in this version was manually move vms between clusters or expand cluster resources.

Reply
0 Kudos
CristianSalguei
Contributor
Contributor

Thanks! @madey83 madey83 we are still waiting for VMware's reply. They told to us that will bring up the workaround after the unavailability of move VMS by our side. We want to modify that hardcoded value.

Reply
0 Kudos
CristianSalguei
Contributor
Contributor

VMware technical support says that there is no workaround for this version. Final response

Reply
0 Kudos
DeviVmware
VMware Employee
VMware Employee

Have you tried disabling HA admission control ?

Reply
0 Kudos