| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by naillo 1300 days ago
	They're motivating that choice via this paper: https://arxiv.org/pdf/2203.15556.pdf The paper shows that you can get better performance than gpt-3 with a much smaller model if you bump up the training time and training data like x4.

1 comments

macrolime 1299 days ago

Larger models are still much better. Google's parti model can do text perfectly and follows prompts way more accurately than Stable Diffusion. It's 20B parameters and with the latest int8 optimizations it should be possible to get that running on a consumer 24GB card in theory.

I think they're looking into larger models later though

link