IBM enables the CPU hardware prefetch by default but Intel recommends turning the feature off depending on what the server is doing. Anyone have any preferences?
Hmmm, I'm starting to think it should be off.
I think you would be wrong. Try it and see what happens.
Instruction supply may become a substantial bottleneck in future generation processors that have very long memory latencies and run application workloads with large instruction footprints such as database servers. Prefetching is a well-known technique for improving the effectiveness of the cache hierarchy
employs a hardwarebased breadth-first search of future control-flow to cope with weakly-biased future branches, prescient instruction prefetch uses precomputation to resolve which controlflow path to follow. Furthermore, as the precomputation frequently contains load instructions, prescient instruction prefetch often improves performance by prefetching data.
prefetch uses helper threads to perform instruction prefetch on behalf of the main thread.
A key challenge for instruction prefetch is to accurately predict control flow sufficiently in advance of the fetch unit to tolerate the latency of the memory hierarchy. The notion of prescient instruction prefetch was first introduced as a technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch.
Just found this in the Systemx redbook.
BIOS levels permit various settings for performance in certain IBM System x
servers.
Processor Adjacent Sector Prefetch
When this setting is enabled, (enabled is the default for most systems), the
processor retrieves both sectors of a cache line when it requires data that is
not currently in its cache. When it is disabled, the processor will only fetch the
sector of the cache line that includes the data requested. For instance, only
one 64-byte line from the 128-byte sector will be prefetched with this setting
disabled.
This setting can affect performance, depending on the application running on
the server and memory bandwidth utilization. Typically, it affects certain
benchmarks by a few percent, although in most real applications it will be
negligible. This control is provided for benchmark users who want to fine-tune
configurations and settings.
Processor Hardware Prefetcher
When this setting is enabled, (disabled is the default for most systems), the
processors is able to prefetch extra cache lines for every memory request.
Recent tests in the performance lab have shown that you will get the best
performance for most commercial application types if you disable this feature.
The performance gain can be as much as 20% depending on the application.
For high-performance computing (HPC) applications, we recommend you turn
HW Prefetch enabled and for database workloads, we recommend you leave
the HW Prefetch disabled.
Both prefetch settings do decrease the miss rate for the L2/L3 cache when they
are enabled but they consume bandwidth on the front-side bus which can reach
capacity under heavy load. By disabling both prefetch settings, multi-core setups
achieve generally higher performance and scalability.
Based on that bolsen I'd venture to say its a Turn Off for ESX hosts due to the nature of ESX and processors flipping around between different guests with different memory spaces and instructions. There would be dead time in the processor as it "forgets" about what it prefetched for the other guest while it waits to work for a different.
Interesting. I'll have to keep it in mind when we replace our fleet this year (still stuck on regular Xeons on IBM Blades and 366)