| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 2568 days ago
	It's a mix of OA 117M and 345M at the moment. I haven't observed too much in the way of overfitting yet, so there should still be benefits to going up another 4.4x in model size to 1.5B. My guess is that at 1.5B, it'll start being more important to improve the poetry corpus, since you can already start to see problems with it - the Alexander Pope brokenness and the occasional prose generation of footnotes/commentary are definitely undesirable, and I suspect there would be less 'run on' effect in samples if the original corpus actually properly marked '<\|endoftext\|>' for each poem...