Hacker News new | ask | show | jobs
by pedrovhb 1031 days ago
That's nice! I've been thinking a lot about models running code lately, and using GPT-4 with its code interpreter to try out ideas quickly. Have you considered, as OpenAI does, using IPython for this?

What strikes me is that there isn't yet a great interface for models to explore and edit code. Using the interactive interpreter mode in ChatGPT is a rather frustrating cycle of trying something, it not working properly, and the model continuing to debug it. Each time it does, it has to rewrite the whole code again. Then it has to redefine even unchanged functions to update definitions that reference them. Half of the time, it redefines a class with "# ... (same as above)" and only after executing does it realize the other methods were actually needed, requiring another redefinition. This is quite wasteful, given the current technology's context size and output speed limitations.

I've been building some tools to make the process faster. Initially, I wrote some IPython extensions and tricked the model into installing them. For instance, IPython allows you to set up an extension that preprocesses cell input before running it. ChatGPT's interactive interpreter can use IPython magics like `%%writefile` or `%%bash`, but it runs into silly issues too often. It has a nearly irresistible urge to start the code cell with a comment spelling out it's now going to write file XYZ, which invariably breaks magic and means it can't write to files without having to make a big multiline string literal and writing to disk with code. To fix this, I made a quick IPython extension to strip off any leading comment lines in a cell where a cell magic was detected.

Despite these improvements, the whole process is still really dull. I built a script to generate importable Python modules, including arbitrary pip packages and other code, all crammed into a b64-encoded tar.gz string that automatically extracts and runs the pip install commands when imported. That's a long workaround and it really shouldn't be this difficult.

I pivoted and am now writing a more generic interface for models to write, explore, edit, and run code and tests. I have several half-prototyped ideas, but it's starting to take shape. The main feature allows specific code chunks to be referenced and edited/removed/appended to. This is done by letting the model specify it wants to replace some chunk of code identified by a class/function/method qualified name, or by sections delimited by `# section: <name>` and `# endsection` comments. This way, it can replace just `Foo.NestedFooClass.bar_method` in xyz_module without redefining the whole module.

Then there's still a bunch of other low-hanging fruit to implement that I believe can significantly improve the whole process. Automated tests or other analysis like mypy/pyright could be run after an edit, documentation could be provided for the relevant objects after an uncaught exception, imports could be auto-added when missing and unambiguous, and better search and exploration tools could be provided to avoid printing out full codebases hogging precious context. Local models offer even more possibilities. For example, early stopping on syntax or name errors could be implemented, along with logit warping to boost tokens corresponding to names/keywords valid in the current scope. A lot of these things are already built, just packaged for humans instead (say, the Jedi autocomplete suggestions could be used for identifying tokens to boost).

Right now, models are effectively stuck writing code in Notepad with a broken backspace key, and it could be a lot easier than that.

2 comments

Do you have any demo or github with your prototype? Ive checked your blog but no recent posts.

I was also thinking about deeper notebook and LLM integration for linear scripts.

Same! One path forward with Open Interpreter is to make a "self-writing Jupyter notebook" with the same UI as Jupyter— edit code, rearrange it, export as a py, etc. Any thoughts on how you'd do that deeper notebook/LLM integration, or what it might need?
dozens of great ideas in here pedro.

> The main feature allows specific code chunks to be referenced and edited/removed/appended to.

Do you know if it works to just include in the system message something like "When editing a function, don't rewrite the whole thing, be sure to edit modules piecemeal e.g. Foo.NestedFooClass.bar_method = ..."?

> imports could be auto-added when missing and unambiguous

GREAT idea. You're write, tons of low hanging fruit here. A preprocess.py file should probably be added to Open Interpreter with all these ideas— what do LLMs tend to fail at that we can programmatically correct in each code block? Then just run that on each block it writes before asking the user for approval to run it.

> logit warping to boost tokens corresponding to names/keywords valid in the current scope

WOAH. Have you heard of outlines? (https://normal-computing.github.io/outlines/) Lets you put in any Regex you want, then as the LLM is generating, logit bias is used to conform the output to that Regex. Works for local models. I can totally imagine pairing some linting function with outlines to get models to write automatically linted code on the first pass.

For my thing, I abandoned IPython since I wanted a more uniform approach to multiple programming languages. Open Interpreter keeps a single subprocess open with `python -i` to get a live Python interpreter that displays a real-time output. It's acts the same as IPython, but I can just swap out that command (for example, with `node -i`) to get different languages. I think this also gets around the issues with cell magic, which as you've said, might confuse it— instead we just ask the model to tell us what language it wants to use + the code it wants to run in a single function call.

Have thought a lot about letting the user edit code. The next version of Open Interpreter = an electron app with a UI so you can edit code before it runs, better copy/paste, convo history, etc.

> with a broken backspace key

By the way, have you heard of this idea of training models to use a backspace key? Powerful stuff. We could think about implementing this in preprocess.py — another pass by GPT-4 or whatever to edit things it thinks isn't going to work. To the user it would just look like GPT-4 has a working backspace key.

Would love to see your github on these if you're willing to share! Open Interpreter is MIT license so ideally I could make it attractive enough / similar enough to your project to simply join up / contribute some of these ideas.