Hacker News new | ask | show | jobs
by sitkack 4421 days ago
This is just the start. Next phase is to over clock while transistor error budget is within bounds, need to have heat and error sensors all over the die.

The other one is having functional instruction packet transactions that they can retry somewhere else if they fail during processing.

With these changes, the CPUs will always be operating at some pre determined error rate regime. No more over clocking, just change the AER (allowable error rate) register, which also will be a thing for simulations that don't matter like games and that do matter like Excel handling your payroll.

2 comments

This is not so far from reality. Ben Zorn has done some work on this -- in particular, his work on "Flikker" might be interesting to you. The paper (http://research.microsoft.com/en-us/um/people/moscitho/publi...) was published at ASPLOS'11.

In general, this concept is called "good enough computing"; periodically, people think about it, and then brush it by the wayside. But it is a neat thought experiment, even if nothing else!

I have been toying with the design of a floating point processor that has configurable precision for each operation, but I don't yet know enough about the strict needs of numerical computation.

We already do this with SP, DP, EP, bignum, arbitrary precision and algorithms that are precision tolerant so I am not sure how much of an advantage it would have.

One idea I had would be to decompile a high performance benchmark and then synthesize microbenchmarks for groups of basic blocks to get instruction packet timing for various FP operations and then model the distribution in speedups from use lower precision math.

These papers look interesting

http://isl.korea.ac.kr/paper/TVLSI_May2004.pdf

http://passat.crhc.illinois.edu/rakeshk/dsn_13_cam.pdf

If you enjoy these stuffs, you will also enjoy Michael Carbin's work. Slides from OOPSLA'13:

http://people.csail.mit.edu/mcarbin/slides/oopsla13.pdf

I have been enjoying both of these immensely.
Sorry if I've misunderstood you, but I can't imagine many/any programs (even games) which function, well, at all if random transistor error is introduced.
If the errors are confined to specific operations and specific parts of a computer, such as a floating point unit, then in some cases a certain level of random error can be acceptable.
LSB bit flips or GPUs can be tolerant to errors.