vmware_1234
Contributor
Contributor

CPU Affinity across 2 different sockets

I have Dual-Intel Xeon quad core system. My workloads are cache intensive. I have SUSE-SLES10-SP1 as a guest OS running on top of ESX 3.5. I am unable to set the cpu affinity across 2 physical cpus. Setting cpu affinity with 0,1,2,3 or 4, 5,6,7 works, but i am unable to set otherwise (0,2,4,6 etc.,). The guest VM is using 4 vCPU.

Does anyone know why the ESX server is not letting me pin the vCPUs to physical cpus across 2 different sockets? The VM is not managed by a Virtual Center. Changes are made through VI client.

0 Kudos
12 Replies
Ken_Cline
Champion
Champion

I'm not sure why it's not allowing you to pin across sockets, but I would encourage you to NOT use CPU affinity at all. The scheduler is cache aware and will try very hard to make sure that your workload stays on the most appropriate cores. CPU affinity is one of those features that - while it exists, and there are (very, very, very few) reasons to use it - usually should not be used. You've spent a lot of $$ on ESX, why not let it do what it's designed to do (it does it very well...)

Ken Cline

Technical Director, Virtualization

Wells Landers

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
vmware_1234
Contributor
Contributor

We certainly need to map the vCPUs to physical CPUs across different socket. I does not look like the ESX server is competely cache aware as we see a 34% poor performance in ESX server to the native mode.

0 Kudos
khughes
Virtuoso
Virtuoso

It could be due to using the 4 vCPUs to begin with. Have you tried lowering the vCPU's to 2 possibly? I'm sure that this is a very intensive server, which would be my only guess why you would assign that many vCPUs to it but if you have more than a couple more VMs on that ESX host its probably going to be waiting a while to get access to 4 cores all at once while other smaller VMs probably are shooting data through like its no thing.

  • Kyle

-- Kyle "RParker wrote: I guess I was wrong, everything CAN be virtualized "
0 Kudos
Ken_Cline
Champion
Champion

Moved to the performance forum.

As khughes mentioned, you should try reducing the number of vCPUs allocated to your VM. If you have any contention on your host, then scheduling becomes difficult. The vmkernel uses relaxed co-scheduling of vCPUs, so it's not as bad as it used to be when strict co-scheduling was in use...but in most cases, the fewer the vCPUs the better.

What else is running on this host? Are you experiencing any contention? Before going the CPU affinity route, I'd suggest trying Reservation and Shares to give the vmkernel hints about how you want things managed.

Ken Cline

Technical Director, Virtualization

Wells Landers

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
mcowger
Immortal
Immortal

VMware actually is aware of cache locality, and does make a best effort to schedule vCPUs onto the same physical CPU wheever possible, but lots of VMs with lots of VCPUs may prevent it from being able ot do that.

--Matt

--Matt VCDX #52 blog.cowger.us
0 Kudos
vmware_1234
Contributor
Contributor

There is nothing else running on the server.There is just one VM that is running on top of the ESX server. Still the performance is very poor.

On the CPU Affinity:

The funny thing is, even in a 2 vCPU case we are not able to pin the vcpu to physical CPU across different sockets.

0 Kudos
khughes
Virtuoso
Virtuoso

If nothing else is running on it and its a 8 core box I don't see how seeing affinity is going to change anything. There is nothing running on the host that is going to challenge it for resources. Unfortunatly there are just the slight few number of servers that shouldn't be virtualized, this could be one of them.

  • Kyle

-- Kyle "RParker wrote: I guess I was wrong, everything CAN be virtualized "
0 Kudos
Ken_Cline
Champion
Champion

What is the configuration of the host and the VM? How much RAM on the host, how much allocated to the service console, how much vRAM allocated to the VM? Do you have all of the RAM in the physical server balanced evenly across all of your pCPUs?

Ken Cline

Technical Director, Virtualization

Wells Landers

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
Ken_Cline
Champion
Champion

Pinning across sockets would likely give you the worst possible cache hit rate. The cache is local to the socket, so if you split the vCPUs across sockets, you're guaranteeing that a large percentage of your memory accesses are going to be to distant, rather than local, memory.

Ken Cline

Technical Director, Virtualization

Wells Landers

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
vmware_1234
Contributor
Contributor

The Host is a Dual Intel Quad Core Xeon 3.0 Ghz x5365. The machine is fully populated with 32GB RAM. The host currently runs one guest.

The Guest is 4 vCPU, 16 GB RAM allocated running SuSE SLES10 SP1. As i indicated earlier, the guest performs 34% poor when compared to host running in native mode. The workload's memory consumption is well within the range.

0 Kudos
Ken_Cline
Champion
Champion

When you run it in native mode, are you running it on the same platform? If so, then is it really a fair comparison - it would have access to all eight cores and all 32GB RAM.

Ken Cline

Technical Director, Virtualization

Wells Landers

VMware Communities User Moderator

Ken Cline VMware vExpert 2009 VMware Communities User Moderator Blogging at: http://KensVirtualReality.wordpress.com/
0 Kudos
vmware_12345
Contributor
Contributor

In the native mode, The workload loads up only four processors and remainder of the four are completely idle. The memory consumption is not high at all. So it is a fairly good comparison as only 4 cpus are used in native mode.

0 Kudos