VMware Cloud Community
Blaze4up
Enthusiast
Enthusiast

E105: PANIC: PhysMem: creating too many Global lookups

Hi all,

After running for a long time without any problems. Some vms started to hang.

They're not reacting on anyting. The CPU usage is 0 Hz and if you try to take over the vmware console of the vm, the connection will be interupted.

So I started to dig in the logs of the virtual machine, and found a lot of log entries like this:

Log for VMware ESX version=6.7.0 build=build-14320388

2019-12-07T14:14:01.622Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.627Z| vcpu-0| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.630Z| vcpu-0| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.634Z| vcpu-4| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.637Z| vcpu-0| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.639Z| vcpu-0| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.643Z| vcpu-3| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.647Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.650Z| vcpu-0| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.653Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.657Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.660Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.662Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.668Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.673Z| vcpu-2| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

2019-12-07T14:14:01.679Z| vcpu-4| W115: Memory regions  (0xfc000000, 0xfcfff000) and  (0xfc810000, 0xfc81f000) overlap (0x54f0024000 0x5520026000).vcs = 0xfff, vpcuId = 0xffffffff

And after this:

2019-12-07T14:14:01.679Z| vcpu-4| E105: PANIC: PhysMem: creating too many Global lookups.

2019-12-07T14:14:08.634Z| vcpu-4| W115: A core file is available in "/vmfs/volumes/5cdd51ee-fd4310f2-58c4-24b6fd652bce/0-pg-virtgpu008/vmx-zdump.000"

2019-12-07T14:14:08.634Z| mks| W115: Panic in progress... ungrabbing

2019-12-07T14:14:08.634Z| mks| I125: MKS: Release starting (Panic)

2019-12-07T14:14:08.634Z| mks| I125: MKS: Release finished (Panic)

2019-12-07T14:14:08.643Z| vcpu-4| I125: Writing monitor file `vmmcores.gz`

2019-12-07T14:14:08.722Z| vcpu-4| W115: Dumping core for vcpu-0

2019-12-07T14:14:08.722Z| vcpu-4| I125: VMK Stack for vcpu 0 is at 0x451ae7c93000

2019-12-07T14:14:08.722Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:09.115Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:09.116Z| vcpu-4| W115: Dumping core for vcpu-1

2019-12-07T14:14:09.116Z| vcpu-4| I125: VMK Stack for vcpu 1 is at 0x451af3b13000

2019-12-07T14:14:09.116Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:09.510Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:09.510Z| vcpu-4| W115: Dumping core for vcpu-2

2019-12-07T14:14:09.510Z| vcpu-4| I125: VMK Stack for vcpu 2 is at 0x451aebb13000

2019-12-07T14:14:09.510Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:09.904Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:09.905Z| vcpu-4| W115: Dumping core for vcpu-3

2019-12-07T14:14:09.905Z| vcpu-4| I125: VMK Stack for vcpu 3 is at 0x451aeb713000

2019-12-07T14:14:09.905Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:10.300Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:10.300Z| vcpu-4| W115: Dumping core for vcpu-4

2019-12-07T14:14:10.300Z| vcpu-4| I125: VMK Stack for vcpu 4 is at 0x451affd13000

2019-12-07T14:14:10.300Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:10.692Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:10.693Z| vcpu-4| W115: Dumping core for vcpu-5

2019-12-07T14:14:10.693Z| vcpu-4| I125: VMK Stack for vcpu 5 is at 0x451af0313000

2019-12-07T14:14:10.693Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:11.085Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:11.085Z| vcpu-4| W115: Dumping core for vcpu-6

2019-12-07T14:14:11.085Z| vcpu-4| I125: VMK Stack for vcpu 6 is at 0x451afcb13000

2019-12-07T14:14:11.085Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:11.474Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:11.474Z| vcpu-4| W115: Dumping core for vcpu-7

2019-12-07T14:14:11.474Z| vcpu-4| I125: VMK Stack for vcpu 7 is at 0x451af2813000

2019-12-07T14:14:11.474Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:11.941Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:11.941Z| vcpu-4| W115: Dumping core for vcpu-8

2019-12-07T14:14:11.941Z| vcpu-4| I125: VMK Stack for vcpu 8 is at 0x451af4913000

2019-12-07T14:14:11.941Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:12.334Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:12.334Z| vcpu-4| W115: Dumping core for vcpu-9

2019-12-07T14:14:12.335Z| vcpu-4| I125: VMK Stack for vcpu 9 is at 0x451ae7293000

2019-12-07T14:14:12.335Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:12.728Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:12.728Z| vcpu-4| W115: Dumping core for vcpu-10

2019-12-07T14:14:12.728Z| vcpu-4| I125: VMK Stack for vcpu a is at 0x451af4f93000

2019-12-07T14:14:12.728Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:13.121Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:13.122Z| vcpu-4| W115: Dumping core for vcpu-11

2019-12-07T14:14:13.122Z| vcpu-4| I125: VMK Stack for vcpu b is at 0x451ae8913000

2019-12-07T14:14:13.122Z| vcpu-4| I125: Beginning monitor coredump

2019-12-07T14:14:13.514Z| vcpu-4| I125: End monitor coredump

2019-12-07T14:14:34.966Z| vcpu-4| I125: Printing loaded objects

So the vms has crached, and it sort of looks like memory related.

I have more vms like this.

Anyone any idea ??

Thanks!!

Tags (2)
5 Replies
Cynakil
Contributor
Contributor

Hi Blaze4up,

Did you get a fix/response for this problem.?

We are experiencing exactly the same problem

ESXi 6.7 U3 Host with 2x Tesla V100 GPUs

NVIDIA GRID Host Driver 10.3

VMs with vGPUs are randomly failing/crashing due to Memory region overlap errors.

VM's on the same hosts without vGPUs seems to be running fine.

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

What type of firmware is your virtual machine configured to use – BIOS or [U]EFI?  Our EFI implementation is much better at handling passthrough devices with large MMIO regions such as GPUs.

If you are currently using BIOS, though, it might be worth trying to set up a VM with EFI firmware instead.  Unfortunately most OSes can't simply be switched from BIOS boot to EFI boot without reinstalling the OS. 😞

--

Darius

Reply
0 Kudos
Blaze4up
Enthusiast
Enthusiast

Hi Darius,

Thanks for you reaction! And sorry for my late one. apperntly I hanv't been notified about the reaction.

But, the vms are BIOS and running Centos7. The GPU,s we use with vGPU and the largest profile 32Gb profile.

We just currently run Nvidia GRID 10.0, so I will see if the issue still appears.

If it does, do you suggest the try and use EFI?

thanks, Gemma

Reply
0 Kudos
Blaze4up
Enthusiast
Enthusiast

Hi,

No didn't got a fix yet.

We have one GPU each host..

Indeed, vms without vgpu are running fine luckily

If you have a fix, and you would like to share it, I would appreciate it much!Smiley Happy

Thanks, Gemma

Reply
0 Kudos
dariusd
VMware Employee
VMware Employee

If you have the time to create a VM with EFI firmware and install the OS into that, it might be a worthwhile experiment.

--

Darius

Reply
0 Kudos