FusionPenguin
Enthusiast
Enthusiast

spurious APIC interrupt through vector ff on CPU#3, should never happen.

Hi,

I am running VMWare Workstation Latest (14.1.2) which runs Linux Mint 18.3 as a GUEST and that is running on Ubuntu 18.04 as a HOST.

The system config is as follows :

- MOBO : Gigabyte MD80-TM0 with 2 Intel Xeon CPU running at 2Ghz and at 2.3Ghz in Turbo mode

- 112GB RAM

- Linux HOST installed on NVMe drive included on MoBo

- VMWare installed on same drive

- GUEST is installed on a RAID 0 partition.

If I fire up Linux Mint 18.3 as the guest, and if taht stays idle, no problems.

But as soon as it is heavily sollicited (to build and compile source code), I get those errors :

spurious APIC interrupt through vector ff on CPU#3, should never happen.

And that error is really noticeable in HOST as it produces what I'd call "micro-freezes" : HOST becomes unresponsive for a fraction of a second (best scenario), sometimes unresponsive fro several seconds.

Error shows up about every 2'30"...

Any idea ? Any changes to guest BIOS to make ? Any options to enable/disable in guest BIOS ?

This did not show up with Linux Mint 18 used as a HOST 😞

Regards.

0 Kudos
7 Replies
bluefirestorm
Champion
Champion

Try the workaround as suggested here

Re: spurious APIC interrupt on CPU#1, should never happen.

by using the configuration line

monitor_control.disable_hostedIPI = TRUE

0 Kudos
FusionPenguin
Enthusiast
Enthusiast

Hi,

Thanks for your answer.

Here's an extract of the kern.log after I tried to apply the fix recommended in that link (that I already applied, but to make sure, I applied it again after the 14.1.2 update).

May 29 19:01:05 budgie-penguin kernel: [87249.137071] spurious APIC interrupt through vector ff on CPU#32, should never happen.

May 29 19:04:04 budgie-penguin kernel: [87428.558173] spurious APIC interrupt through vector ff on CPU#0, should never happen.

May 29 19:04:57 budgie-penguin kernel: [87481.087593] spurious APIC interrupt through vector ff on CPU#7, should never happen.

May 29 19:05:01 budgie-penguin kernel: [87485.203175] spurious APIC interrupt through vector ff on CPU#30, should never happen.

May 29 19:05:09 budgie-penguin kernel: [87493.617576] spurious APIC interrupt through vector ff on CPU#36, should never happen.

May 29 19:05:27 budgie-penguin kernel: [87511.532570] spurious APIC interrupt through vector ff on CPU#0, should never happen.

May 29 19:05:34 budgie-penguin kernel: [87518.282722] spurious APIC interrupt through vector ff on CPU#36, should never happen.

May 29 19:06:12 budgie-penguin kernel: [87556.432330] spurious APIC interrupt through vector ff on CPU#0, should never happen.

May 29 19:06:24 budgie-penguin kernel: [87568.164605] spurious APIC interrupt through vector ff on CPU#10, should never happen.

May 29 19:06:24 budgie-penguin kernel: [87568.321033] spurious APIC interrupt through vector ff on CPU#9, should never happen.

May 29 19:06:40 budgie-penguin kernel: [87584.251429] spurious APIC interrupt through vector ff on CPU#35, should never happen.

May 29 19:06:41 budgie-penguin kernel: [87585.156602] spurious APIC interrupt through vector ff on CPU#32, should never happen.

May 29 19:06:41 budgie-penguin kernel: [87585.432895] spurious APIC interrupt through vector ff on CPU#40, should never happen.

May 29 19:06:43 budgie-penguin kernel: [87587.126744] spurious APIC interrupt through vector ff on CPU#0, should never happen.

May 29 19:06:44 budgie-penguin kernel: [87588.865279] spurious APIC interrupt through vector ff on CPU#37, should never happen.

May 29 19:06:59 budgie-penguin kernel: [87603.425499] spurious APIC interrupt through vector ff on CPU#17, should never happen.

May 29 19:07:02 budgie-penguin kernel: [87606.460577] spurious APIC interrupt through vector ff on CPU#12, should never happen.

May 29 19:07:04 budgie-penguin kernel: [87608.686080] spurious APIC interrupt through vector ff on CPU#7, should never happen.

May 29 19:07:05 budgie-penguin kernel: [87609.554378] spurious APIC interrupt through vector ff on CPU#5, should never happen.

May 29 19:07:05 budgie-penguin kernel: [87609.825266] spurious APIC interrupt through vector ff on CPU#13, should never happen.

May 29 19:07:07 budgie-penguin kernel: [87611.040290] spurious APIC interrupt through vector ff on CPU#15, should never happen.

I would say : unfortunately it did not help at all 😞

The person posting the fix says those errors should be innocuous. If they were, I wouldn't mind. The problem is, they produce FREEZES of the host machine. Those freezes last between 1 second to up to 10 seconds (more or less).

So, they aren't innocuous at all.

I will try to install another system (Manjaro or Linux Mint) and check if those errors happen on those systems. If not, it is system/kernel related.

Regards.

0 Kudos
bluefirestorm
Champion
Champion

jmattson was a VMware employee and from his old posts I see, he would likely have been the go-to person for this kind of thing.

I assumed that the VM was powered off, not suspended, when the /etc/vmware/config change was made. Otherwise it may not take effect as the /etc/vmware/config is the place to put host-wide changes instead of individual VM vmx configuration changes.

The old thread that I referred to the CPUs involved were not Xeons; while your motherboard information suggests you either have a E5-26xx v3 or E5-26xx v4. The interrupts generated would not have been intended for the VM OS as Xeons from Ivy Bridge and newer generations would support virtual-interrupt delivery (i.e. an interrupt generated within the VM OS would not cause a VM exit). Virtual-interrupt delivery seems to be exclusive to Xeon CPUs as I haven't seen this in desktop/mobile class chips.

vmx| I125:   Virtual-interrupt delivery               {0,1}

Does it make any difference if you assign a single virtual socket or two virtual sockets to the VM? There is also a separate thread where a Windows 10 host with dual Ivy Bridge Xeon (HP Z820) would also experience freezes but the same VM would just be fine on an older HP Z600 (with just simply plugging the disk from the Z820 to the Z600). So I am not holding out that the single/dual virtual socket VM configuration would make any difference but it is worth a try.

So it might a confluence of factors, dual CPUs, some errant device driver, possibly storage driver, but considering that you already mentioned it does not happen when the host is Linux Mint 18, it might be down to the kernel in your case.

0 Kudos
FusionPenguin
Enthusiast
Enthusiast

Hi,

Thanks for taking time to reply and to try to help me solve my problem.

There is one weird thing happening.

It seems I still have those spurious APIC interrupts erro messages (weel, it seems is an understatement : I still have them for sure).

BUT : now that I applied the recommened patch, it seems that those have become innocuous.

Because even if I have a bunch of those errors, I don't experience any micro-freezes anymore. Or if I do, they are so "micro" that I can't notice them anymore...

Like right now, a VM with Debian 9.4 is running and building an Android ROM.

I have had like 30 spurios APIC interrrupts on various CPU's.

BUT : no freezes at all, or at least not that I could notice. And as sonn as I do not notice them, I don't really care.

Anyways, I'll keep you posted and will try to dig further because I suspect something else.

Will try to confirm what I suspect and report here.

Regards.

0 Kudos
FusionPenguin
Enthusiast
Enthusiast

Hi,

Well, I've tried several settings in the BIOS, changing C-State and changing VT-d settings.

It seems confirmed that the settings do not make the errors disappear, but that they allow to continue to work on the host without being plagues by small freezes.

It seems the machine has "nano freezes" but they are really short so do not forbid to work on basic tasks : browsing, Office document editing, mailing...

I also tried to use the same VM with Virtual Box.

Absolutely no problem with Virtual Box... but that software is awfully slow 😞

An Adroid build that takes about 95 minutes with VMWare takes about 130 minutes with VB... 45% more, even though I assigned it 28 cores, which VMWare doesn't allow (unfortunately !).

I also tried to boot Ubuntu and run ot with different kernels : 4.14.36 ; 4.15.10 ; 4.16.4... same thing : errors, but nano freezes.

My guess ?

When I launch Workstation the first time, it says it has to build some tools to run.

VB doesn't do that.

And I assume that what is built goes to the kernel as it asks to "rebuild "each time I switch the kernel.

My 2 cent guess is that it is that operation that produces the errors...

Think I have to wait for an updated Workstation...

I may add that I have enabled the installation of "development" updates...

Regards.

0 Kudos
bluefirestorm
Champion
Champion

I am not a regular user of VirtualBox (used it once for a short period some years back inside a VMware VM); so I can't compare between VirtualBox and VMware Workstation in terms of performance.

I doubt if VT-d would make any difference as VMware Workstation does not support PCIe device passthrough.

I would avoid assigning more virtual CPUs than what the VM can actually make use of. Assigning more virtual cores can lead to a possible perverse scenario where the idle virtual CPUs preempts the virtual CPU(s) that are doing actual work. In a scenario where a virtual CPU is waiting for exclusive lock/use to a resource such as a file (this scenario is likely typical for a compile/build), it could get pre-empted by the virtual CPU(s) that are idle. If I am not mistaken, most x86 OS's, the CPUs that are idle still has to execute a HLT instruction. Although there is a CPU feature called PAUSE-loop exiting (available since Westmere) that is intended to prevent this perverse scenario, there are timing parameters that has to be set by the hypervisor. So if the timings do not fit the actual VM workload, the virtual CPU(s) waiting for the exclusive access to the resource can still get pre-empted by the idle virtual CPUs.

So you may want try to reduce the number of virtual CPUs in the VM that does the compilation/build work if it can only realistically use 2-3 of them and assigning an excessive number such as 16 might actually be counterproductive.

As for the rebuild, I don't know exactly why Workstation/Player on Linux does that (certainly it does not happen on Workstation on Windows or Fusion on macOS); but the Workstation/Player on Linux rebuilds the vmmon and vmnet modules when there is host OS kernel change/update.

0 Kudos
FusionPenguin
Enthusiast
Enthusiast

Hi,

And thanks for taking time to reply.

I have decided to try to switch to another Linux Distribution. Instead of Ubuntu 18.04 I decided to go to Manjaro 17.1.0, which is an Arch based distro.

I have the exact same problems : the spurious APIC interrupt through vector ff on CPU#17, should never happen comes up every minute or so (sometime a little more, sometimes a little less).

And I confirm that i now produces more numerous but really fast freezes... a fraction of a second. It's noticeable, but less problematic than when I had freezes for up to 6 or 7 seconds I think.

The number of cores assigned to VM are irrelevant I think : I use a dual Xeon motherboard, with 2 xeon 2690 v3 that have 14 cores, 28 threads each.

I think there's a rela problem in the way VMWare calls for CPU ressources... but that is just an opinon, I'm not tech enough to say.

Regards.

0 Kudos