Hacker News new | ask | show | jobs
by osaariki 2396 days ago
You're right, CryptoNets used a data layout optimized for throughput with a batch size 4096. Since then we've done a lot of work on low latency inference with our CHET compiler [1] and my colleagues with LoLa [2]. It all comes down to the data layouts you use.

[1]: https://www.cs.utexas.edu/~roshan/CHET.pdf [2]: https://arxiv.org/pdf/1812.10659.pdf