| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by maxaravind 98 days ago

There has been a lot of talk about how continual learning might be "just and engineering challenge" and that we could have agents that continuously learn from experience by just having longer and longer context windows.

Here is a clip of Dario hinting at something similar: https://www.youtube.com/watch?v=Z0x99Uu4rJc

What I am trying to argue for in the article is how such a view might be misplaced - just extending the context length and adding more instructions in the context will not get you continual learning - the representational capacity of weights will be the limiting factor.

Just a fun way to think about it. Would love to hear your thoughts.

1 comments

qsera 98 days ago

>just extending the context length and adding more instructions in the context will not get you continual learning...

I agree. But I am wondering if context would help in answering superficial questions and only fail when answering questions that require deeper understanding.

link

maxaravind 97 days ago

I'd say the way to think about it is in terms of the questions you ask being in-distribution or out of distribution w.r.t the model training dataset.

Consider this, if something fundamental has changed in the world after the model was released(ie after the knowledge cut off date), then it would be very difficult for the model to reason about it. One concrete example is the the following: If you ask Opus or any decent coding model to do effort estimation on a coding task, then it would come up with multi week timelines - the models themselves doesn't know that because "they exist", these timelines have now been slashed to a few hours - you can try saying this in the prompt, however, they don't seem to internalise this.

link

qsera 97 days ago

So basically that is what I was saying.

Imagine an LLM that can also OCR. Would it be possible to make it OCR a totally new letter by only showing a single picture of it and including the fact in the context?

I think it would not be possible. That would be a good demonstration of the point I (and possibly you as well) is trying to get across.

link