|
Disclaimer: Going off memory of the presentations myself, not at all an expert, etc. Ivan Goddard's descriptions of the branch predictor sounds quite simple that I can remember; The predictor itself is completely detached from the compiler, outside of the instruction layout falling into EBBs. A point that (I think is on the Mill forum somewhere) they've said about the prediction algorithim itself, is that they're doing simulations with nearly the simplest known prediction algorithm, and getting something like 90% success rates from that compared to the 98% the state-of-the-art sees. The difference is entirely in what data is carried by the prediction. (I believe this is in the forum thread on Switch statements, referenced in this video) What the predictor stores and delivers as output is the logical equivalent to an instruction pointer, and where in the EBB they exit, instead of a boolean (or an entire exit table). Which is more state, but not replacing the entire state of the processor. And, as explained in this presentation, the impact of having (effectively) an instruction pointer from the branch predictor is you know what to prefetch from memory. And that pointer, being associated with some other EBB that the predictor might know about, is also enough to ask the predictor for the next prediction, get a second prefetch, and so on. That also has implications to a point Goddard made in this presentation, in that some entire Switch statements have only one "EBB" (although that's loose terminology itself), and thus only one prediction associated with it. The prediction is to where the block will exit, not whether any specific prediction will succeed, thus there is no case where the processor will stall multiple times before finding the correct control flow. It can only fail once before it determines the correct exit to the block (in general, at least...) Which is I believe how they are expecting good results, rather than anything peculiar about how they determine the prediction from history. |
Let's assume that Mill can produce a prefetch chain though that can load additional code segments beyond the initial branch target. Does this really benefit the processor? I don't understand how a static prediction of where I will branch to next after an initial branch target is even useful. If I have already branched to that location before the code will be sitting in the L1 cache waiting for me, if it isn't, well, how does the branch predictor know more about the behavior of my program than the L1 cache?
To put it another way: if the instruction set doesn't fit in the L1 cache is there some predictable way that I can load and unload the cache just based on a given branch target( that in turn implies other branch targets will be loaded next with a total instruction set larger than the cache). I don't really think there is any possible way to predict the behavior of an entire program like this. Larger instruction segments and more branch targets imply more complex programs that are correspondingly harder to predict so I'm always better off just relying on the instruction cache.
Assuming Mill tries to load entire chains of instructions into the cache based on a simple static target I would be much better off just disabling this and relying on the instruction cache in almost all cases therefore.