Hacker News new | ask | show | jobs
by limapedro 624 days ago
This is such a interesting paper, sadly they don't have big models, I'd like to see a model trained on TinyStories or even C4 since it should be faster than the transformer variant and see how it compares.