| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by osaariki 2396 days ago
	You're right, CryptoNets used a data layout optimized for throughput with a batch size 4096. Since then we've done a lot of work on low latency inference with our CHET compiler [1] and my colleagues with LoLa [2]. It all comes down to the data layouts you use. [1]: https://www.cs.utexas.edu/~roshan/CHET.pdf [2]: https://arxiv.org/pdf/1812.10659.pdf