We are running an ubuntu 20.04 VM on Esxi 7.0 Update 1. The host has two AMD EPYC 7742 64-Core Processor, i.e. 256 logci cpu cores in total.
Assigning more than 128 cores will require enabling the IOMMU. This causes lots of CPU hard lockups on the ubuntu VM.
Running with fewer than 128 cores does not have this issue.
And you are certain that nothing but the vIOMMU change is causing those and not a large change in vCPUs? I.e. you tested with 127 and 129, same load? The reason is that above 128 vCPUs, you'll also start using SMT, which the guest isn't aware of hence it might expect linear scaling for additional workers.
There is no way around the vIOMMU because more than 128 vCPU requires APIC IDs beyond 255 (8 bit) which requires 2XAPIC which requires remapping which requires, you guessed it, vIOMMU.