Hacker News new | ask | show | jobs
by CupOfJava 2619 days ago
openai says they can do it:

Long-term structure MuseNet uses the recompute and optimized kernels of Sparse Transformer to train a 72-layer network with 24 attention heads—with full attention over a context of 4096 tokens. This long context may be one reason why it is able to remember long term structure in a piece