| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Veedrac 2361 days ago
	The appropriately large models with public recognition I know of use attention, which is too memory-hungry to work effectively on the CS-1. The datasets aren't the issue. I'm fine with skepticism. It's certainly plausible that they don't actually do all that well.