| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 1250 days ago
	There have been attempts to separate fact knowledge from language knowledge - for example DeepMind RETRO that uses a search index of 1T tokens. RETRO manages to reach GPT-3 performance on some tasks with a 20x smaller model. I believe smaller model are more useful for extractive and classification tasks than creative text generation.