| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shriver 2908 days ago

Actually it's not as clear cut as you'd expect. Obviously you can't represent every number in floating point, so you have to choose a way to round numbers - and for simple operations like add you can correctly round the results. For transcendental operations like x^y it's actually unknown how many resources you'd need to correctly calculate x^y for every valid value of x and y[1]. So since you can't calculate these numbers to correct rounding, you have to choose a level of rounding for your approximation - like 3 units of least precision rounding at the output. Of course we all need to know how accurate these are - so the OpenCL standard specifies it[2] - exp requires being correct to 3ULPs for single/double and 2ULPs for half.

Now if you have 3 ULP to play with, the maker of an Intel CPU is going to design an exp instruction to best make use of the existing Intel functional units. But an Intel FPGA dev is going to design an exp instruction to best make use of Lookup Tables and 18x18 multiplies - because that's what they have on the FPGA.

So whilst you'll get the same answer for x^y on Intel CPU and Intel FPGA within 3 ULPs those rounding errors are going to be different between the two architectures. So now, if you want to compute a normal distribution on Intel FPGA vs CPU you'll get 3 ULPs in your exponent, but that'll carry forward into the rest of the equation.

So now you have a choice - do you use the built-in function for exp on the Intel CPU - which is OpenCL compliant just like the FPGA, and get unknown rounding errors in what is probably a mathematically sensitive task, or do you emulate the actual sub-operations the FPGA does? In which case your hardy RTL designer who wrote that exponent function RTL is going to have to write an implementation in C that emulates the hardware. Oh and they don't only have to do that for exp - they have to do that for 100s of mathematical functions, and it'll run dog slow on the CPU compared to using the native functions.

[1]https://en.wikipedia.org/wiki/Rounding#Table-maker's_dilemma

[2]https://www.khronos.org/registry/OpenCL/specs/opencl-2.1-env...

1 comments

kjeetgill 2908 days ago

> do you emulate the actual sub-operations the FPGA does?

Yes. It's an emulator.

> ... is going to have to write an implementation in C that emulates the hardware.

Makes sense. It's an emulator.

> ... and it'll run dog slow on the CPU compared to using the native functions.

Isn't that to be expected? It's an emulator. This isn't like games where it just has to look close. If it's a dev tool for testing correctness, exactness matters.

link

shriver 2907 days ago

Well last time I looked, the answer to question 1 for the Intel OpenCL SDK is actually no.

And while yes, it's expected to be slow compared to the native functions, that's not the problem. It's slow compared to simulation.

link