|
|
|
|
|
by tliltocatl
295 days ago
|
|
> LLM doesn't learn incrementally from previous encounters This. Lack of any way to incorporate previous experience seems like the main problem. Humans are often confidently wrong as well - and avoiding being confidently wrong is actually something one must learn rather than an innate capability. But humans wouldn't repeat same mistake indefinitely. |
|
The feedback you get is incredibly entangled, and disentangling it to get at the signals that would be beneficial for training is nowhere near a solved task.
Even OpenAI has managed to fuck up there - by accidentally training 4o to be a fully bootlickmaxxed synthetic sycophant. Then they struggled to fix that for a while, and only made good progress at that with GPT-5.