Understanding/Changing CPU Contention Heat Maps si...

Groove200uk · ‎10-10-2018

Hey

We've just upgraded to 6.7 and magically overnight 2 of our clusters have gone from......DANGER DANGER buy stuff now , to , your ok for at least a year !

Mind blown....best upgrade we've ever done 😉

But seriosuly something seems amiss here , either 6.5 was seriosuly broken , or 6.7 is telling us a completely different version of the truth.

FOr example , I dont understand why the host selected on the left is nice and green from a contention colouration perspective , when its over 100% demand and the graph on the right is showing this ?

And this shows only a slight blip and yellow at 85% ?

If nothing else can I adjust the colouring of these graphs anywhere and lower the thresholds , say to 80% as the yellow trigger ? ( Looked in policies but cant find one that would affect these )

For comparison this is what 6.5 looked like ! Slightly different story

GayathriS · ‎10-10-2018

There are some fundamental changes to the way we look at capacity management.

please check below blogs which helps you in understanding the change :

https://blogs.vmware.com/management/2018/04/real-time-capacity-management-vrealize-operations-6-7.ht...

regards

Gayathri

sxnxr · ‎10-11-2018

Your first screenshot and point.

Contention and demand are not related as such. You can have low demand and high contention or visa versa.

Demand is how much (in this case) cpu the host is asking for so it can be over 100% so the CPU utilisation of all of the VMs is 2.76% more than physical available in the host

Contention is generated when there are too many vcpus to physical cores and the vCPU's have to contend with each other. The lower the better

It all depends on you and which metric you want to colour. In this case it is contention

the difference between the first two screenshots depend on the refresh interval. When you load the dashboard the 300 seconds start on the heat map. When you click the host the graph will show you the latest collected stat and its 300 second countdown will start. So you can get a situation where the chart is more unto date than the heatmap. on the chart you can mouse over the 5 mins before the current and it may be the same as the heatmap.

Also you could have the heatmap more up to date that the chart if you load the dashboard and dont do anything for 2 mins and the select the host the chart will update and start its 300 second countdown and the heatmap is already half way through its refresh cycle.

Your 6.5 is 100% of usable not the total as you are taking 22% off the total to get usable. In the 6.7 it is demand of the total. to get a closer comparison in the 6.5 you have a total of 481.1 GHz and a peak of 459.5 GHz leaving approx 9.5% free

I am not 100% sure what 6.5 uses for demand based capacity model but if you expand the cpu to you see the graph you can see what it is using.

All

Understanding/Changing CPU Contention Heat Maps since upgrade to 6.7