Hacker News new | ask | show | jobs
by jordn 1341 days ago
This is planned to be 70B but trained in the chinchilla-optimal way (more data + training). Scaling laws suggest this should outperform the base 175B GPT-3. Then release the base model as well as the RLHF-tuned models.