|
|
|
|
|
by cossatot
66 days ago
|
|
Maybe, just maybe, this is of obvious utility to the many people who have needs that are not yours? I very regularly need to interact with my work through a python interpreter. My work is scientific programming. So the variables might be arrays with millions of elements. In order to debug, optimize, verify, or improve in any way my work, I cannot rely on any other methods than interacting with the code as it's being run, or while everything is still in memory. So if I want to really leverage LLMs, especially to allow them to work semi-autonomously, they must be able to do the same. I'm not going to dump tens of GB of stuff to a log file or send it around via pipes or whatever. Why is there a nan in an array that is the product of many earlier steps in a code that took an hour to run? Why are certain data in a 200k-variable system of equations much harder to fit than others, and which equations are in tension with each other to prevent better convergence? Are interpreters and pdb not great, previously-existing tools for this kind of work? Does a new tool that lets LLMs/agents use them actually represent some sort of hack job because better solutions have existed for years? |
|
To avoid polling, you need to run the process with some knowledge of the internal interpreter state. Then a surprising number of edge cases start showing up once you start using it for real data science workflows. How do you support built-in debuggers? How do you handle in-band help? How do you handle long-running commands, interrupts, restarts, or segfaults in the interpreter? How do you deal with echo in multi-line inputs? How do you handle large outputs without filling the context window? Do you spill them to the filesystem somewhere instead of just truncating them, so the model can navigate them? What if the harness doesn’t have file tools? And so on.
Then there is sandboxing, which becomes another layer of complexity wrapped into the same tool.
I’ve been building a tool around this problem: `mcp-repl` https://github.com/posit-dev/mcp-repl
So tmux helps, but even with a skill and some shims, it does not really solve the core problem.