|
|
|
|
|
by exDM69
5161 days ago
|
|
> Unless I have just been transported by a time warp I really think this is fiction. Nope, the Intel guys and gals do this kinda magic day in, day out. Or at least the chips they manufacture do. I'm not familiar with the internals of the prefetcher of any CPU at this level, but let me wave hands here. This is what the prefetcher could do: All it takes is for the prefetcher to get a cache line when requested, and then observe what is inside and look at pointer-sized and aligned values that look like pointers and are pretty close to the original cache line's (virtual) address. Now if these values happen to be sane virtual addresses in the current process, the prefetcher might as well fetch them one cache closer to the CPU. If it hits, it might yield big performance boost in real world apps. If it misses, it's just a little wasted electricity. All modern CPUs do dirty little tricks like this if it helps them outshine their competitors. Btw. you can add prefetch instructions in your code manually if you do linked list traversals or similar. In GCC you can use __builtin_prefetch() compiler intrinsic. |
|
Caches have limited size. If it misses, it also evicts something else from the cache. If that is what is actually needed, this costs performance.
"and are pretty close to the original cache line's (virtual) address"
Why does it have to be 'pretty close'?
"Now if these values happen to be sane virtual addresses in the current process"
That sanity check would involve visiting the paging tables, so it would require at least two indirections (http://lwn.net/Articles/253361/). If a cache line is 16 bytes, you would have at least 4 positions where a 32-bit pointer could be present. So, at least four times two memory lookups would be needed. I think all of them would go through the same cache, but even assuming that the CPU has ways of signalling that it should not recurse, I do not think it is practical to do what you describe (disclaimer: I am not an expert on CPU design)
What is possible is to guess at where data is to be found. That allows CPUs to read and speculatively execute instructions from the physical memory that they think backs the virtual address of the PC while they, in parallel, do the lookup to verify that. See http://dl.acm.org/citation.cfm?id=2000101&dl=ACM&col.... I do not know whether this has made it into actual CPUs, though.