|
|
|
|
|
by whimsicalism
513 days ago
|
|
no one publicly pushes any techniques very far except for meta and it’s true they continue to train dense models for whatever reason. the transformer was an entirely new architecture, very different step change than this e: and alibaba |
|