VMware Cloud Community
bolsen
Enthusiast
Enthusiast

Intel CPU - Enable hardware prefetch?

IBM enables the CPU hardware prefetch by default but Intel recommends turning the feature off depending on what the server is doing. Anyone have any preferences?

0 Kudos
4 Replies
bolsen
Enthusiast
Enthusiast

Hmmm, I'm starting to think it should be off.

0 Kudos
RParker
Immortal
Immortal

I think you would be wrong. Try it and see what happens.

Instruction supply may become a substantial bottleneck in future generation processors that have very long memory latencies and run application workloads with large instruction footprints such as database servers. Prefetching is a well-known technique for improving the effectiveness of the cache hierarchy

employs a hardwarebased breadth-first search of future control-flow to cope with weakly-biased future branches, prescient instruction prefetch uses precomputation to resolve which controlflow path to follow. Furthermore, as the precomputation frequently contains load instructions, prescient instruction prefetch often improves performance by prefetching data.

prefetch uses helper threads to perform instruction prefetch on behalf of the main thread.

A key challenge for instruction prefetch is to accurately predict control flow sufficiently in advance of the fetch unit to tolerate the latency of the memory hierarchy. The notion of prescient instruction prefetch was first introduced as a technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch.

0 Kudos
bolsen
Enthusiast
Enthusiast

Just found this in the Systemx redbook.

BIOS levels permit various settings for performance in certain IBM System x

servers.

Processor Adjacent Sector Prefetch

When this setting is enabled, (enabled is the default for most systems), the

processor retrieves both sectors of a cache line when it requires data that is

not currently in its cache. When it is disabled, the processor will only fetch the

sector of the cache line that includes the data requested. For instance, only

one 64-byte line from the 128-byte sector will be prefetched with this setting

disabled.

This setting can affect performance, depending on the application running on

the server and memory bandwidth utilization. Typically, it affects certain

benchmarks by a few percent, although in most real applications it will be

negligible. This control is provided for benchmark users who want to fine-tune

configurations and settings.

Processor Hardware Prefetcher

When this setting is enabled, (disabled is the default for most systems), the

processors is able to prefetch extra cache lines for every memory request.

Recent tests in the performance lab have shown that you will get the best

performance for most commercial application types if you disable this feature.

The performance gain can be as much as 20% depending on the application.

For high-performance computing (HPC) applications, we recommend you turn

HW Prefetch enabled and for database workloads, we recommend you leave

the HW Prefetch disabled.

Both prefetch settings do decrease the miss rate for the L2/L3 cache when they

are enabled but they consume bandwidth on the front-side bus which can reach

capacity under heavy load. By disabling both prefetch settings, multi-core setups

achieve generally higher performance and scalability.

0 Kudos
FredPeterson
Expert
Expert

Based on that bolsen I'd venture to say its a Turn Off for ESX hosts due to the nature of ESX and processors flipping around between different guests with different memory spaces and instructions. There would be dead time in the processor as it "forgets" about what it prefetched for the other guest while it waits to work for a different.

Interesting. I'll have to keep it in mind when we replace our fleet this year (still stuck on regular Xeons on IBM Blades and 366)

0 Kudos