Y
Hacker News
new
|
ask
|
show
|
jobs
by
jordn
1341 days ago
This is planned to be 70B but trained in the chinchilla-optimal way (more data + training). Scaling laws suggest this should outperform the base 175B GPT-3. Then release the base model as well as the RLHF-tuned models.