| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by MacsHeadroom 1110 days ago
	30 and 40B parameter models regularly crush GPT-3 175B (Davinci) on every benchmark. GPT-3.5 is probably a 13B parameter model and it beats the original GPT-3 175B on most benchmarks (but not the more recent finetunes like 175B Davinci-003), so hyperparameters are clearly very important.

1 comments

How are hyperparameters tuned for GPT3.5, is there any leak on the method they use?