Hacker News new | ask | show | jobs
by chas 1707 days ago
Integer arithmetic is still simpler to implement in hardware and therefore faster than floating point arithmetic, so it is still heavily used for resource-constrained numerical programs. This shows up in signal processing code for e.g. very low-level network software, radios, and image processing. It is also popular for running efficient neural net inference. In neural nets, it is usually paired with reduced precision e.g. int8 because in addition to getting more operations per second, you can transfer more weights per byte of L1 cache and per byte-per-second of memory bus. Since navigating those memory limitations is a major part of performance optimization, reduced-precision fixed-point can help a lot. I should also mention that resource constraints happen with high-end hardware where you are trying to get maximum performance/throughput and in low-end hardware where you are trying to make things possible at all, so fixed-point arithmetic may be more widespread than you would expect.
2 comments

> Integer arithmetic is still simpler to implement in hardware and therefore faster than floating point arithmetic

This is true in the abstract, but not necessarily true of a specific commodity chip. Processor vendors spend a lot of silicon on offering low latency and high throughput floating point support. It's a fairly recent trend of processor vendors adding fast int8 or bfloat16 vectors after the ML craze demonstrated that there was demand for vector support for more bandwidth-friendly datatypes.

It is absolutely very dependent on your specific hardware. I would expect dedicated digital signal processor chips to almost always support a high-performance fixed-point multiply-add instruction. In contrast, I would expect chips targeted and HPC or scientific computing to be much more focused on double throughput than anything else.
At least as far as int32 vs float32 goes, surprisingly float is easier to make fast in the hardware. This is because floats are composed of multiple sections that can be processed in parallel whereas all of the bits of and integer addition, for example, have a serial dependency.
The relative performance of float32 vs int32 is heavily dependent on the specific operations you care about and what hardware resources (i.e. area, power, std cells) you have available.

While floating point numbers can be cleanly split into a mantissa and exponent, adding floats requires shifting the exponent, which can be an expensive operation. Each portion of a floating point arithmetic operation is also implemented with integer arithmetic, which limits the performance spread. Many floating point operations require significantly fewer bits though, which can lead to major speedups.

On the integer side, the serial carry dependence mean ripple-carry adders are usually a bummer, but there a ton of carry-lookahead variants that can lessen the performance impact of that dependency. If you have several integer additions at once, you can use a carry-save variant to only pay that serial cost once all of the additions are done. Finally, if you are willing to significantly change your integer encoding, there are tools like redundant number systems[0] that allow you to move around the traditional trade-offs for arithmetic circuits including completely removing the serial dependency on carries. That said, if you require rescaling your fixed-point numbers during the computation, the fixed-point implementation will require more integer operations than floating-point operations.

All of this is also quite dependent on the overall architecture of the chip too, since the number of integer units vs float units and how data is moved around can have a way bigger impact on performance than how each arithmetic operation is implemented.

[0] http://lux.dmcs.pl/csII/ca2_RedundantNS.pdf