|
|
|
|
|
by hackerlight
823 days ago
|
|
> online learning - the ability to act then see the results of your action and learn from that. I don't think that should be necessary, if you are talking about weight updates. Offline batch mode Q-learning achieves the same thing. By online learning, did you mean working memory? I'd agree with that. Whether it's RAG, ultra-long-context, and LSTM-like approach, or something else, is TBD. |
|
I don't think there is any substitute for a predict-act-learn loop here - you don't want to predict what someone else has done (which is essentially what LLMs learn from a training set), you want to learn how your OWN predictions are wrong, and how to update them.