|
|
|
|
|
by adrian_b
1340 days ago
|
|
For general computational applications (i.e. not for special graphics cases), implementing double-precision operations using single-precision operations is considerably more complicated than implementing quadruple-precision operations using double-precision operations. The reason is that it is not enough to extend the precision of the 32-bit FP numbers. The exponent range must also be extended. The standard double-precision numbers have an exponent range that is large enough to make underflow and overflow very unlikely in most algorithms. With the very small exponent range of FP32 numbers, underflow and overflow is very likely and this must be corrected in any double precision implementation. So it is not enough to use two FP32 numbers to represent one FP64 number. One must use either a third number for the exponent, or at least one of the two 32-bit numbers must be integer and partitioned into exponent and significand parts. Both approaches will lead to much more complex algorithms and a much worse speed ratio for FP64 implemented with FP32 vs. FP128 implemented with FP64. |
|
In deep learning, this is huge! If you have numbers this big, then something is definitely already wrong. If you have numbers that small, then you definitely don't care.
I wonder if deep learning will save us from poorly conditioned linear algebra too.