Hacker News new | ask | show | jobs
Show HN: Open Interpreter – CodeLlama in your terminal, executing code (openinterpreter.com)
82 points by killianlucas 1032 days ago
Hey HN. Over the summer I built Open Interpreter, a CLI that lets you ask Code-Llama or GPT-4 to write/run code.

It runs multiple languages (Python, Shell, HTML/CSS, Node JS rn) then sends the output back to the language model.

It’s essentially an open-source, local implementation of OpenAI’s Code Interpreter. No limits on file size, runtime timeouts, or web access.

Everything is streamed beautifully, rendered with Markdown, and syntax highlighted.

Try it out and let me know what you tried! - Killian

7 comments

Hey HN! Over the summer I built Open Interpreter, a CLI that lets you ask Code-Llama to write and run code.

It executes multiple languages (Python, Shell, HTML/CSS, Node JS) then sends the output back to the language model.

It’s essentially an open-source, local implementation of OpenAI’s Code Interpreter, but without limits on file-size, runtime timeouts, or web access.

Everything is streamed beautifully to your terminal and rendered with Markdown and syntax highlighting.

Try it out and let me know what you tried!

- Killian

Congrats. Is there any way to provide feedback for LLM? Is there any way to ask for approval before code execution?
Yes, you can iterate on the code being written by asking for changes— very much like ChatGPT. If it starts writing code you don't want, you can hit CTRL-C to stop generating, and give it your feedback right away.

And yes, `$: interpreter` runs in confirmation mode by default. You'll need to approve every block of code before it's run (bypass this with the `-y` flag).

Killian, you're killin it!
This looks very cool! Trying it out now.

Question: Is https://open-procedures.replit.app/ absolutely needed for OI to work?

Let me know how it goes!

It works without it, but it will struggle with those tasks (https://github.com/KillianLucas/open-procedures/tree/main/pr...) because of GPT-4's training data cutoff.

If you give it the right info (e.g "use Selenium without web package manager") it works without it. Sometimes it experiments and figures it out anyway.

As I use it, do I find lots of tasks it fails at just because of the training data cutoff, so that db is just meant to provide it with up-to-date documentation.

Running OI in `--local` mode will disable it though!

The demo is on Mac, does this work on Windows or Linux?

How does it figure out how to use apps like email and calendar?

Looks like magic though.

Thank you so much!

Yes, both — anywhere you can run Python.

For app control, we pass in some system information (like OS) into the system message. The LLM then knows to use Applescript for Mac, PowerShell for Windows, `cal` and `alpine` for Linux, etc.

This looks really fantastic. Also, how/who made that beautiful video?
aw thanks! I made it with Rotato, highly recommend that. Lots of the animation I actually used Open Interpreter to put together (e.g. "Make a square gif out of these images, cropping out the whitespace in each frame", etc)

Let me know if you get the chance to try it!

I'd actually love to know how you made the web-site, it has the distinctly openAI-look, with large clear lettering etc. Is that custom built or did you use a theme in a framework? Thanks
Thanks! Custom built, it's actually Tailwind CSS that I made on the https://play.tailwindcss.com/ playground lol. You can write HTML with tailwind there, then just copy the "generated CSS" at the bottom, paste it into a style.css, paste your HTML into a index.html, then upload to github pages.

All static, no building stuff, just that playground + github pages.

The Open Interpreter site is actually open-sourced here if you want to copy it: https://github.com/KillianLucas/open-interpreter-website

Very interesting, thanks for the details!
video looks amazing... almost like magic -
thanks! there's something magic about LLMs running code, I think.
That's nice! I've been thinking a lot about models running code lately, and using GPT-4 with its code interpreter to try out ideas quickly. Have you considered, as OpenAI does, using IPython for this?

What strikes me is that there isn't yet a great interface for models to explore and edit code. Using the interactive interpreter mode in ChatGPT is a rather frustrating cycle of trying something, it not working properly, and the model continuing to debug it. Each time it does, it has to rewrite the whole code again. Then it has to redefine even unchanged functions to update definitions that reference them. Half of the time, it redefines a class with "# ... (same as above)" and only after executing does it realize the other methods were actually needed, requiring another redefinition. This is quite wasteful, given the current technology's context size and output speed limitations.

I've been building some tools to make the process faster. Initially, I wrote some IPython extensions and tricked the model into installing them. For instance, IPython allows you to set up an extension that preprocesses cell input before running it. ChatGPT's interactive interpreter can use IPython magics like `%%writefile` or `%%bash`, but it runs into silly issues too often. It has a nearly irresistible urge to start the code cell with a comment spelling out it's now going to write file XYZ, which invariably breaks magic and means it can't write to files without having to make a big multiline string literal and writing to disk with code. To fix this, I made a quick IPython extension to strip off any leading comment lines in a cell where a cell magic was detected.

Despite these improvements, the whole process is still really dull. I built a script to generate importable Python modules, including arbitrary pip packages and other code, all crammed into a b64-encoded tar.gz string that automatically extracts and runs the pip install commands when imported. That's a long workaround and it really shouldn't be this difficult.

I pivoted and am now writing a more generic interface for models to write, explore, edit, and run code and tests. I have several half-prototyped ideas, but it's starting to take shape. The main feature allows specific code chunks to be referenced and edited/removed/appended to. This is done by letting the model specify it wants to replace some chunk of code identified by a class/function/method qualified name, or by sections delimited by `# section: <name>` and `# endsection` comments. This way, it can replace just `Foo.NestedFooClass.bar_method` in xyz_module without redefining the whole module.

Then there's still a bunch of other low-hanging fruit to implement that I believe can significantly improve the whole process. Automated tests or other analysis like mypy/pyright could be run after an edit, documentation could be provided for the relevant objects after an uncaught exception, imports could be auto-added when missing and unambiguous, and better search and exploration tools could be provided to avoid printing out full codebases hogging precious context. Local models offer even more possibilities. For example, early stopping on syntax or name errors could be implemented, along with logit warping to boost tokens corresponding to names/keywords valid in the current scope. A lot of these things are already built, just packaged for humans instead (say, the Jedi autocomplete suggestions could be used for identifying tokens to boost).

Right now, models are effectively stuck writing code in Notepad with a broken backspace key, and it could be a lot easier than that.

Do you have any demo or github with your prototype? Ive checked your blog but no recent posts.

I was also thinking about deeper notebook and LLM integration for linear scripts.

Same! One path forward with Open Interpreter is to make a "self-writing Jupyter notebook" with the same UI as Jupyter— edit code, rearrange it, export as a py, etc. Any thoughts on how you'd do that deeper notebook/LLM integration, or what it might need?
dozens of great ideas in here pedro.

> The main feature allows specific code chunks to be referenced and edited/removed/appended to.

Do you know if it works to just include in the system message something like "When editing a function, don't rewrite the whole thing, be sure to edit modules piecemeal e.g. Foo.NestedFooClass.bar_method = ..."?

> imports could be auto-added when missing and unambiguous

GREAT idea. You're write, tons of low hanging fruit here. A preprocess.py file should probably be added to Open Interpreter with all these ideas— what do LLMs tend to fail at that we can programmatically correct in each code block? Then just run that on each block it writes before asking the user for approval to run it.

> logit warping to boost tokens corresponding to names/keywords valid in the current scope

WOAH. Have you heard of outlines? (https://normal-computing.github.io/outlines/) Lets you put in any Regex you want, then as the LLM is generating, logit bias is used to conform the output to that Regex. Works for local models. I can totally imagine pairing some linting function with outlines to get models to write automatically linted code on the first pass.

For my thing, I abandoned IPython since I wanted a more uniform approach to multiple programming languages. Open Interpreter keeps a single subprocess open with `python -i` to get a live Python interpreter that displays a real-time output. It's acts the same as IPython, but I can just swap out that command (for example, with `node -i`) to get different languages. I think this also gets around the issues with cell magic, which as you've said, might confuse it— instead we just ask the model to tell us what language it wants to use + the code it wants to run in a single function call.

Have thought a lot about letting the user edit code. The next version of Open Interpreter = an electron app with a UI so you can edit code before it runs, better copy/paste, convo history, etc.

> with a broken backspace key

By the way, have you heard of this idea of training models to use a backspace key? Powerful stuff. We could think about implementing this in preprocess.py — another pass by GPT-4 or whatever to edit things it thinks isn't going to work. To the user it would just look like GPT-4 has a working backspace key.

Would love to see your github on these if you're willing to share! Open Interpreter is MIT license so ideally I could make it attractive enough / similar enough to your project to simply join up / contribute some of these ideas.

This is absolutely slick. Well done!
Thanks so much!