- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yeah, turns out its super hard to get the balance right (automatically), my rule of thumb is to disable localitywheightactionaffinity for hosts with multiple VMs that are ~ half as large or larger than the pNUMA node and expected to be busy. Is there something in the wording of https://kb.vmware.com/s/article/2097369 we should update? I'm pretty sure I wrote the current iteration of that KB and it could maybe be a bit more prescriptive? It's just very hard to not be "handwavy" ....
We are constantly improving the algorithm though, most recently in 7.0 U2. You can of course always use CPU reservations which everyone seems to forget about ...
For you specific case, I assume the VMs were doing some sort of (external) IO? That IO device was likely attached to the (over)crowded node, hence the locality "benefit". You could run https://github.com/vbondzio/sowasvonunsupported/blob/master/pci2numa.sh to check the device locality (6.7 and newer).
You can check the relationships of worlds via:
[root@esxi04:~] sched-stats -t vcpu-comminfo
vcpu leader name isRel type id rate isRel type id rate isRel type id rate (...)
1048601 1048601 fastslab n 2 1048765 1 n 2 1048763 1 (...)
1048602 1048602 SVGAConsole n 2 1048756 1 (...)
1048606 1048606 tlbflushcount n 2 1048873 1 n 2 1048781 1 n 2 1050268 1 (...)
1048607 1048607 tlbflushcounttryflus n 2 1048671 2 n 2 1048586 1 n 2 1048650 1 (...)
1048614 1048614 ndiscWorld n 2 1048751 1 (...)
1048622 1048622 CmdCompl-4 n 2 1048819 1 (...)
1048624 1048624 CmdCompl-6 n 2 1051028 1 n 2 1051415 1 n 2 1051080 1 (...)
1048625 1048625 CmdCompl-7 n 2 1051418 1 n 2 1048873 1 n 2 1051024 1 (...)
1048628 1048628 CmdCompl-10 n 2 1049045 1 n 2 1051418 1 n 2 1048819 1 (...)
1048629 1048629 CmdCompl-11 n 2 1048873 1 n 2 1051065 1 (...)
1048632 1048632 CmdCompl-14 n 2 1051030 1 n 2 1051416 1 n 2 1051417 1 (...)
1048633 1048633 CmdCompl-15 n 2 1051028 1 n 2 1197403 1 n 2 1051030 1 (...)
1048634 1048634 CmdCompl-16 n 2 1207887 1 (...)
1048638 1048638 CmdCompl-20 n 2 1051066 1 n 2 1051416 1 n 2 1051024 1 (...)
1048639 1048639 CmdCompl-21 n 2 1051023 1 n 2 1051065 1 n 2 1051024 1 (...)
1048641 1048641 CmdCompl-23 n 2 1152278 1 (...)
1048643 1048643 AsyncTimeout n 2 1048764 1 n 2 1048758 1 (...)
1048644 1048644 DeviceTaskmgmtWatchd n 2 1048764 1 (...)
(...) (...)