| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 2282 days ago
	Eh. It's built on Transformers, and people have already demonstrated considerable model distillation/compression on those just like every other kind of NN, and as they note, once you've trained a teacher model, you can probably train a wide flat model for similar results. (As I recall, WaveNet used to be similarly slow, but even without the parallel WaveNet retraining, with proper caching of repeated states, you could make it orders of magnitude faster and approach realtime.)