Hacker News new | ask | show | jobs
by li-ch 3748 days ago
Seems that CNTK already implemented this.

https://github.com/Microsoft/CNTK/wiki/Enabling-1bit-SGD

1 comments

That quantizes the gradient for compression of the communication between nodes, which is cool, but each node must still calculate a 32 bit floating point gradient locally. What GP is asking for is a way to avoid having any floating point math at all.

If you could implement training with only single-bit operations rather than floating point math, a hardware implementation could be several orders of magnitude faster and more efficient than current CPUs/GPUs. That would certainly usher in a revolution in computer architecture.