| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jordn 1341 days ago
	This is planned to be 70B but trained in the chinchilla-optimal way (more data + training). Scaling laws suggest this should outperform the base 175B GPT-3. Then release the base model as well as the RLHF-tuned models.