I have just started a looking at a companys existing virtual infrastructure. I have come accross a small behaviour that has be a little confused.
The situation is as below:
1 esx server (2 quad core cpus, 34 gig of ram, san attached).
vmware 3.5 update 2.
Only 1 VM running.
The VM that is running on the server is set to run 4vCPU. I am using both the vmware inf client tool and veeam monitor (free version) and I can see that the VM has the load spread on CPU cores 4,5,6,7 on the esx server.
When i look at the ESX server the remaining cores 0,1,2,3 seem to behaving a little odd. Core 0 is running at around 50% all the time and the other 1,2,3 cores are hardly being utilised at all.
I have looked through the scheduling documents but i cant seem to find anything that relates to this problem.
Even when the VM is not running core 0 is still running at 50%. I am guessing this has somthing to do with either the service console or the vkernel but as i say im guessing.
Has anyone got any idea what is happening in this server?
The service console run´s as a vmguest core 0.
Look at this KB
Determining if IRQ Sharing Issues Affect Your System's Performance
The tell-tale sign of IRQ sharing between VMkernel and the service console is a high number of interrupts being serviced by PCPU0 (CPU #0 on the physical host) while the other CPUs are relatively lightly loaded. The high interrupt rates might sometimes render the service console unusable and cause a high variation in ESX Server performance.
The service console always runs on CPU0... So maybe a view of top CPU processes might lead you in the right direction.
"ps aux | sort -n +2 | tail -5"
50% used when idle seems a bit much.,..
looks like we have a cross USB irq pci bus issue. But i do not think it is the cause of the problem as the interupts file is showing no usage.
Thanks for the help with this.
I stoped the kipmi0 from running as per the article and it had the desired effect. All running as exspected.
There are about 12 ESX servers all running esx 3.5 from update 1 to update 3. This problems is accross the board on all of the servers.
Do you think this could be related to the SUNFIRE servers them selves. This is the first time using SUN hardware and im currently less than impressed.