| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Tarq0n 81 days ago
	If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.