|
That's nice! I've been thinking a lot about models running code lately, and using GPT-4 with its code interpreter to try out ideas quickly. Have you considered, as OpenAI does, using IPython for this? What strikes me is that there isn't yet a great interface for models to explore and edit code. Using the interactive interpreter mode in ChatGPT is a rather frustrating cycle of trying something, it not working properly, and the model continuing to debug it. Each time it does, it has to rewrite the whole code again. Then it has to redefine even unchanged functions to update definitions that reference them. Half of the time, it redefines a class with "# ... (same as above)" and only after executing does it realize the other methods were actually needed, requiring another redefinition. This is quite wasteful, given the current technology's context size and output speed limitations. I've been building some tools to make the process faster. Initially, I wrote some IPython extensions and tricked the model into installing them. For instance, IPython allows you to set up an extension that preprocesses cell input before running it. ChatGPT's interactive interpreter can use IPython magics like `%%writefile` or `%%bash`, but it runs into silly issues too often. It has a nearly irresistible urge to start the code cell with a comment spelling out it's now going to write file XYZ, which invariably breaks magic and means it can't write to files without having to make a big multiline string literal and writing to disk with code. To fix this, I made a quick IPython extension to strip off any leading comment lines in a cell where a cell magic was detected. Despite these improvements, the whole process is still really dull. I built a script to generate importable Python modules, including arbitrary pip packages and other code, all crammed into a b64-encoded tar.gz string that automatically extracts and runs the pip install commands when imported. That's a long workaround and it really shouldn't be this difficult. I pivoted and am now writing a more generic interface for models to write, explore, edit, and run code and tests. I have several half-prototyped ideas, but it's starting to take shape. The main feature allows specific code chunks to be referenced and edited/removed/appended to. This is done by letting the model specify it wants to replace some chunk of code identified by a class/function/method qualified name, or by sections delimited by `# section: <name>` and `# endsection` comments. This way, it can replace just `Foo.NestedFooClass.bar_method` in xyz_module without redefining the whole module. Then there's still a bunch of other low-hanging fruit to implement that I believe can significantly improve the whole process. Automated tests or other analysis like mypy/pyright could be run after an edit, documentation could be provided for the relevant objects after an uncaught exception, imports could be auto-added when missing and unambiguous, and better search and exploration tools could be provided to avoid printing out full codebases hogging precious context. Local models offer even more possibilities. For example, early stopping on syntax or name errors could be implemented, along with logit warping to boost tokens corresponding to names/keywords valid in the current scope. A lot of these things are already built, just packaged for humans instead (say, the Jedi autocomplete suggestions could be used for identifying tokens to boost). Right now, models are effectively stuck writing code in Notepad with a broken backspace key, and it could be a lot easier than that. |
I was also thinking about deeper notebook and LLM integration for linear scripts.