|
|
|
|
|
by prmph
172 days ago
|
|
Nothing will really work when the models fail at the most basic of reasoning challenges. I've had models do the complete opposite of what I've put in the plan and guidelines. I've had them go re-read the exact sentences, and still see them come to the opposite conclusion, and my instructions are nothing complex at all. I used to think one could build a workflow and process around LLMs that extract good value from them consistently, but I'm now not so sure. I notice that sometimes the model will be in a good state, and do a long chain of edits of good quality. The problem is, it's still a crap-shoot how to get them into a good state. |
|
LLMs become increasingly error-prone as their memory is fills up. Just like humans.
In VSCode Copilot you can keep track of how many tokens the LLM is dealing with in realtime with "Chat Debug".
When it reaches 90k tokens I should expect degraded intelligence and brace for a possible forced sumarization.
Sometimes I just stop LLMs and continue the work in a new session.