| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by CupOfJava 2619 days ago
	openai says they can do it: Long-term structure MuseNet uses the recompute and optimized kernels of Sparse Transformer to train a 72-layer network with 24 attention heads—with full attention over a context of 4096 tokens. This long context may be one reason why it is able to remember long term structure in a piece