| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by trifurcate 1176 days ago
	In my experience, the smaller models are almost completely worthless as-is. 65B is the only decent one (I'd say just behind gpt-3.5-turbo, and obviously it's not instruction tuned but I mean the coherency of the core language model), and understandably people aren't really paying attention or devoting much resources to the largest one. 30B shows promise for specific tasks with fine tuning, but 7B and 13B are just toys.