Hacker News new | ask | show | jobs
Gorilla-CLI: LLMs for CLI including K8s/AWS/GCP/Azure/sed and 1500 APIs (github.com)
152 points by shishirpatil 1088 days ago
23 comments

Can't speak for this tool yet, but ChatGPT has been great for this use case. Sometimes a tool doesn't have a man page, has hard to navigate docs, or whatever, "What does $flag mean in $tool" tends to work (when it doesn't hallucinate something totally wrong).

One recent example: "what does -w do on curl" .. not a single top 10 result on Google mentions it in the context, but GPT3.5 nails it in two seconds complete with a working example. I know "-w" is interpreted by Google in a special way, but frankly given the obvious context I shouldn't need to know how Google works. (Through experience I also know https://explainshell.com/explain?cmd=curl+-w will do a good job, but ChatGPT actually provides a working example which is even better.)

That said, I do think you need a good critical eye to use LLMs in this context. It's like relying on a calculator. You still need mental math skills to know that 91 * 10 can't equal 2511. Similarly, when GPT starts hallucinating, it helps if you have a high sensitivity to smelling it out.

I think spreadsheet is a slightly better analogy than calculator. The latter has well defined capabilities and essentially 100% accuracy within those bounds. A spreadsheet with a minor typo in one field can produce drastically incorrect results that appear fine to the untrained eye.
> It's like relying on a calculator. You still need mental math skills to know that 91 * 10 can't equal 2511. Similarly, when GPT starts hallucinating, it helps if you have a high sensitivity to smelling it out.

Well, at least my calculators don't have this error rate GPT4 still does. Especially for seemingly simple things like a command flag, I have zero trust if GPT doesn't give me something that will eventually erase all my data.

A calculator isn't going to give me error. I don't need to fact check it constantly.
Maybe, but I suspect you're smart and have the intuition to "just know" if something went very wrong. For example, if you multiply 5 by some other integer and the result ends with a 2, you'd just feel it in your gut, no? This is not true for everyone!

It's not uncommon to have folks with reasonable math skills ending up behind a cash register and insisting they're right when such errors occur and I see a high likelihood of this happening with relying on "AI" too. Domain knowledge and good intuition will continue to be valuable.

how often are you using LLMs?
Somehow I knew that this question would come up, questioning the "progress" makes me a heretic.

So last 2-3 months I subscribed to ChatGPT4 (and much longer to Copilot), worked through most of the HN threads on tips and reviews, posts I could find on "prompt engineering" and have hundreds of sessions with ChatGPT4. So, I still might have missed something, but I think I have a rather good idea of what's going on.

1. It's rather good with understanding what I want. I can dump pretty much anything into it and give it certain rules (things we described years ago as "Google fu" until Google SERP became useless) and it will make something out of it.

2. It's a nice rubberduck to discuss things and get a broad overview on certain topics.

3. It's amazingly stupid, even if I ask it for its confidence, on the validity of its answers. It's like talking to a 8-year-old know-it-all: You have to fact check everything. If I confront it with the error, it even reacts like a 8-year old.

4. Initial responses for intentionally broad topics (summed up with "give me ansible yaml to deploy wireguard to N servers") are often times not working at all and after an hour of query-response you're better off reading ansible docs.

5. Initial responses for intentionally special topics (summed up with "what's the fastest algorithm to sort this given x, y, z and bla will never be A") it frequently comes up with good, sometimes surprisingly creative solutions.

All in all: Why oh why would I trade in correctness with a significant error rate ("hallucination" is a word from SV marketing hell) and debugging bullshit answers. Since debugging things is already a big drag in programming, I need things I can trust to build more things on top of them. If I can't trust 100% the "command" an LLM is generating, I'll never directly let it execute its code.

Thank you very much for your ChatGPT4 opinion. Do you think you can write your opinion about Copilot?
Agreed. For me, ChatGPT has been the killer app for terminal even as someone who’s pretty comfortable in the terminal
For me, there are multiple killer apps. One is in programming. In many cases, I work with a language I'm a bit unfamiliar with, so I ask it for idioms for things that I know a language should be able to do (like summing). Upon reading the code I see it's correct.

This used to be SO, but ChatGPT works way faster.

I use an alternative that just directly calls OpenAI using my API key. I have it mapped to the command `ai` and it works really really well. So far I've found no need for any intermediary or fancy features. It just shows the command with a (y/[N]) prompt and I can choose to run it or not.

I use the first tagged version of `aicmd` before it was given an unneeded intermediary: https://github.com/atinylittleshell/aicmd/tree/v1.0.2

I did this too, way cleaner and faster imo.
Thanks for sharing @iandanforth and @distortionfield, nice project! Are there any additional features, Gorilla can support to make it useful for you folks!
Would really like it if ML projects would declare upfront if you're using cloud models or local ones. A lot of work policies bar us from using externally generated code or inputting business data into external systems.

(as it happens, this one hits the cloud)

Thanks for the feedback, @satokema. We currently mention it in our GitHub, pypi readme, and at install. Would you prefer different wording?
It's not a local model. It queries some endpoint on someone else's computer.
Not local yet. Considering the LLM/generative AI velocity we’ve seen, it’s only a matter of time. It’s helpful to see what others build, providing signal it can be built.

If you’re not comfortable using it in your workflow, consider it a peek at what’s to come. Very exciting times. And it's open source.

Yep, some google cloud server:

    SERVER_URL = "http://34.135.112.197:8000"
Yes indeed. The models are too computationally expensive to run locally (7.5Billion parameters). Though you could in-principle swap in any local model.
Do y'all have plans to release the model for those who have 16gb graphics cards? (I'm assuming the model is fp16?)
What are you talking about? 7b parameter models run insanely fast if you can offload to gpu, and are entirely reasonable speed if CPU only.
Does it prompt for an API key?
Nope. No API key needed since we mostly serve our own Gorilla models.
Loving it! Works nicely for k8s:

  (base)   ~ g get the image ids of all pods running in all namespaces in kubernetes
    kubectl get pods --all-namespaces -o jsonpath="{..imageID}"
  sha256:b19406328e70dd2f6a36d6dbe4e867b0684ced2fdeb2f02ecb54ead39ec0bac0 
  sha256:b19406328e70dd2f6a36d6dbe4e867b0684ced2fdeb2f02ecb54ead39ec0bac0
I recommend shell-gpt[1] for anyone with access to the OpenAI API. It works surprisingly well considering how simple it is. Be sure to browse the examples in the README.

[1] https://github.com/TheR1D/shell_gpt

newbie question.

Not to make this a debug thread but this is what I get when I try out gorilla

> gorilla I want to find my ip address

/home/username/.local/lib/python3.10/site-packages/requests/__init__.py:102: RequestsDependencyWarning: urllib3 (1.26.7) or chardet (5.1.0)/charset_normalizer (2.0.7) doesn't match a supported version! warnings.warn("urllib3 ({}) or chardet ({})/charset_normalizer ({}) doesn't match a supported " Traceback (most recent call last): File "/home/username/.local/bin/gorilla", line 8, in <module> sys.exit(main()) File "/home/username/.local/lib/python3.10/site-packages/go_cli.py", line 128, in main user_id = get_user_id() File "/home/username/.local/lib/python3.10/site-packages/go_cli.py", line 76, in get_user_id assert user_id != "" AssertionError

If you do "pip list --outdated" then it should show you which packages you have installed and that are out of date. Look out specifically for the packages that are mentioned in this error message: requests, urllib3, chardet, charset_normalizer.

You can then upgrade them by doing "pip install [package name here] --upgrade".

Wait so it crashes or does it genuinely generate that as the answer?
Is this actually developed by UC Berkeley or just a project by one of their PhD students?
Most likely an underpaid PhD student.
Maybe [GPTCache](https://github.com/zilliztech/GPTCache) can make it more attractive, because similar problems can be less expensive, and can also be responded to faster. Of course, the specific configuration needs to be based on real usage scenarios.
Hey HN! As one of the contributors and author of Gorilla, we want to express gratitude for your valuable feedback. The community's desire for a straightforward method to invoke Gorilla led to the development of this CLI tool. We appreciate your continued input, so please keep those suggestions coming!
Waiting for the LLM version of ye olden IRC hazing/trolling of: "oh you can fix that with rm -rf /"

Edit: Typo

You can't any more :) You need to add --no-preserve-root
Huh, interesting, I never knew that flag was added then again I haven't tried to nuke / with rm in quite some time.
We've gone full circle: efficient meta languages back to inefficient and ambivalent natural language.
I'm endlessly amused by the fact that among the first applications of LLMs were tools to summarise emails, accompanied by tools to write your emails based on a short description of what you want to say. So soon we'll effectively be communicating by text message, with the LLMs acting as a sort of anti-compression in between.

My brother lives in Japan, and he recently had to write a lot of emails to the company renovating his apartment. He said that ChatGPT was a lifesaver there since at least 75% of semi-formal (i.e. between customer and company) Japanese emails is formality and filler. He just skipped all of that and ChatGPT wrote it for him.

That's actually a very sensible use case!
I think the use cases are where you roughly remember the commands but not the full command. Like rather than you looking up a man page you could potentially use this. An expert in one field may not be in another, so they might also find this useful!
Agreed; this is somewhat useful for beginners but incredibly silly for professionals.
Is it weird I'm kind of more interested in this included "go_questionary" library?
Actually I was also recently looking for a good Python library to make nice terminal UIs , for example to present a list of choices and let user use arrow keys or integers to select options, and ran into Questionary (Python version). A similar interface is used by the excellent “gh” (GitHub) cli tool, not sure which library they use (and may not even be a Python lib)

Do folks recommend Questionary as a good Python command line UI tool? I really like Rich but it goes not have this type of selection UI.

haha. It's a great Open source project!
Hi HN, I'm one of the authors from Gorilla Project. Gorilla now presents in an CLI interface and you can interact with your laptop in English! Feedbacks and suggestions are very welcome!!
A flag for printing the chosen command to stdout instead of executing them in a subprocess would be helpful.

Also I am finding in my environment that longer results don't line wrap and so it's hard to tell what the actual full command is, but that might be just me.

Oo good suggestion @razzypitaz. We'll try to incorporate this in the next release :) BTW it's open sourced, so if you would be interested in raising a PR would love to have you as a contributor!
It runs in the terminal, so you can also CTRL-r it!
How does this compare with github-copilot-cli?
It’s very sketchy that they use stderr and queries for training.

Don’t pass anything sensitive into this program!

Hey @linuxdude314 thank you for the comment. As we mentioned commands are executed solely with your explicit approval; And while we utilize queries and error logs (stderr) for model enhancement, we NEVER collect output data (stdout). This is a stronger guarantee than many of the other LLMs out there and our goal is to help this inform our research.

One of the reasons we open-sourced the front-end, is that if you would like to keep everything private, you can just clone the repo, comment out the logging, install it, and we will still honor and serve your queries if you hit our hosted end-point :) Let us know if there is anything more that we can do to make you comfortable in using our tool!

Giving anyone who can type full system admin abilities without any need for training?

What could go wrong?

No commands are executed without the users explicit approval! So, you could always execute any command anyways if you had access to the terminal!
I would encourage authors to update README with more representative examples.
Thanks for the feedback, @dievskly. Will update it in the next release!
very cool. But like many other uses of LLM it can hallucinate and/or produce a wrong result. For example I tried:

"gorilla dry run of brew upgrade"

And got a response that didn't work.

Thanks @ofermend, we believe that Gorilla will hallucinate lesser than other models but it's not zero yet! We will continue to reduce hallucination. Thanks for the feedback!
Geez, I don't understand most of the words in that headline...
Doesn't this violate the Gorilla glue's trademark?
Why? There are lots of other products named Gorilla. Gorilla Glass (breakage resistant glass for phones) and Gorilla Wear (clothing). Trademarks are only relevant in a particular field. Comuputer utilities aren't adhesives, glass, or clothing.
goodbye Fig!