| HN Mirror

Maybe we'll call it "continuous RLHF" or something like that.

But you might be right that the dynamic part might be the biggest architectural shift needed. You can simulate a lot with in-context memory or clever retrieval, but memory alone doesn't allow the model to get better at chess the same way a human does