Hacker News new | ask | show | jobs
by p1esk 1930 days ago
They did all that before: https://arxiv.org/abs/2101.06840, but they could only fit a model with 13B weights on a single V100.