Hacker News new | ask | show | jobs
by wongarsu 362 days ago
Maybe we'll call it "continuous RLHF" or something like that.

But you might be right that the dynamic part might be the biggest architectural shift needed. You can simulate a lot with in-context memory or clever retrieval, but memory alone doesn't allow the model to get better at chess the same way a human does