|
|
|
|
|
by CupOfJava
2619 days ago
|
|
openai says they can do it: Long-term structure
MuseNet uses the recompute and optimized kernels of Sparse Transformer to train a 72-layer network with 24 attention heads—with full attention over a context of 4096 tokens. This long context may be one reason why it is able to remember long term structure in a piece |
|