| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wongarsu 362 days ago
	Maybe we'll call it "continuous RLHF" or something like that. But you might be right that the dynamic part might be the biggest architectural shift needed. You can simulate a lot with in-context memory or clever retrieval, but memory alone doesn't allow the model to get better at chess the same way a human does