I may have misunderstood, but I believe step 1 (eliding loads) is simply a cache scheduling problem. The optimal solution is the greedy "furthest in the future" eviction policy.
That's an excellent point. I hadn't heart of furthest in the future, but it looks like it does solve step 1. Past that though, it doubt it can be used because each of the 6502's registers are different and don't support the same operations. It's a good idea though, and might work for some specific RISC architecture where all registers behave the same.