Hacker News new | ask | show | jobs
Show HN: GPTDuck – Ask questions about any GitHub repo (gptduck.com)
128 points by penguinten 1265 days ago
21 comments

I thought I'd throw it a curveball:

https://github.com/chbarts/particle-abcs

And my question:

> What language is this written in?

The answer:

> This appears to be a poem written in English. It is written in a form that is often used for teaching purposes, in which each letter of the alphabet is associated with a concept related to a particular field of study. In this case, the poem is about quantum field theory and particle physics. The poem is written in a rhyming couplet form and contains many scientific terms that are specific to the field.

Which is entirely correct. I'm impressed.

How curious: I have a small children's book I picked up at the California Academy of Sciences entitled, "The ABCs of particle physics," made by a joint collaboration between Fermilab and SLAC [1], and here I find another poem in the same vein!

[1] https://www.symmetrymagazine.org/article/the-abcs-of-particl...

Wow. I googled part of your poem to see if maybe it had been on the web previously and GPT was regurgitating text around it (eg: analysis, comments).

I am very impressed with your test - and the result.

I imagine that it might already know how to answer that question for non-Github related texts, which presumably are often not programs. I'd be more interested to see what it responds to "What programming language is this written in?" to see if it will question the premise of the question (although there's always that it might make an educated guess that the poem is valid Perl!)
The fact that it also gave the backstory on this being a known artistic tactic, truly awesome
Hey guys - creator here. Have hit the Github rate limit and currently not working! Am trying to fix! Thanks for all the support!
Might be worth asking users to "login with GitHub" to get past the rate limits (the anonymous rate limits are lowwww like 10 per hour).

Logging with GitHub is still lowish but like 1k per hour or so. (IIRC)

I didn't expect this many people to try it out haha - with hindsight, absolutely should have
There’s no way I’m logging into rando apps using my GitHub id. Aside from their bad UX and the potential to accidentally over grant permissions, the app could abuse GitHub ToS with my credentials and get me banned.

Not worth the risk just to play around with a site.

Funny that github is the bottleneck rather than the OpenAI for once.
Do the request from the browser and that won't happen because the request source will be distributed.
Need to download the files to the server to generate the embeddings!
Update: just shipped a fix for the rate limits - you will need to signup for an account and put your own personal access token in.
The sign-up form doesn't seem to allow passwords with non-alphanumeric characters in them. Ironically, when it sees one, it complains that "password must have at least one number and at least one letter" (even though it already does) - I suspect that's just a catch-all message for invalid passwords?

In any case, the restriction makes it incompatible with pretty much all password generators, including the built-in ones in modern browsers. To improve matters, I would suggest dropping all restrictions on what cannot be included altogether.

Had a broken validator - fixed!
shakes fist at Microsoft
How are you avoiding hitting the rate limit for open ai apis
This appears to be quite nice assuming it works, but I wonder: how does this work?

I thought ChatGPT could only process a very limited context (a few thousand words or so) and this claims to work for repositories <100 MB, which is gigantic in comparison.

The site says: "we will download [the repo] to the server and create embeddings against the code". What kind of embeddings are these?

It uses the OpenAI embedding models to get around the context window problem. You can compress large amounts of text/code into just a few tokens.
Can you elaborate on this? What is the input to the GPT in that case? I was under the impression that GPT is given an array of tokens and it produces one token as its output.
Would love some more information on how to do this. Can you share details or point to a resource?
I'm curious if can share LoC -> token count metrics?
I don't know about embeddings, but it was able to decode a URL encoded string into a javascript animated SVG. So larger texts can be encoded and still "understood." You probably wouldn't even have to declare your encoding method. https://www.tiktok.com/@y.i.t.z.i/video/7184820436839042306
"Something went wrong, please try again later. If the problem persists, please contact us."

Tried several times.

Most likely HN kiss of death, their backend [1] returned 500

[1] https://gptduck-production.up.railway.app/query

Was kinda expecting this but then again it was still only under "new" with 3 votes, so I had doubts that it was maybe just me.
Just came here to post this also!
I tried it for my (incomplete) compiler project[0], and the results are quite funny:

> This program prints a large string of text to the console. It does this by using a String variable called myLongString, which is initialized with a long string of text. The program begins with the line "program LargeString {" and ends with the line "}". Inside the program, there is a comment that reads "Now let's try a comment". This comment is not necessary for the program to work, but it helps to explain what the program does. The program then uses the write() function, which is a function that prints a given string to the console, to print the large string of text.

It seems as though its looking at the sample code files used for testing rather than the actual compiler codebase.

[0] - https://github.com/JLCarveth/zpl-compiler

I'm impressed you got it to do anything. Did it work first time or did you keep clicking 'Submit' until you got a response?
It worked for the code for my personal website at ca.lebe.gg

https://github.com/calebegg/website

It's very impressive!

When it works it’s impressive, but sometimes it just makes things up:

For one of my repos: https://github.com/drpriver/macwatch

> Does this program work on windows?

No, macwatch is a program designed specifically for Mac computers and is not compatible with Windows systems. The program is designed to provide users with a comprehensive overview of their Mac's performance, including CPU usage, memory usage, disk usage, and more. It can also be used to monitor network activity and alert users when a process is consuming too much system resources. Additionally, the code provided is written in Objective-C and C, which are not supported on Windows systems.

---

The first sentence is correct, the rest is just completely wrong.

This is in general problem with gpt models.

When it works, it’s flawless. When it doesn’t work, it just makes stuff up that kind of look good, just to make you happy. And there is really no way to tell from the output…

Pretty funny how it seems to just have invented something based on nothing more than the name macwatch. The last sentence is remarkably strange, like it's just riffing on the idea of "not supported on Windows" and decided to throw in some random facts about the repo, and wound up with a totally false statement.
Great project!!

I was building something on similar lines. It is a semantic search engine for codebases using Openai's embedding and FAISS https://github.com/rahuldan/codesearch

It fetches the functions in the code for a query

I've been thinking about something similar. GPT-3 has dated knowledge on a lot of libraries. If you could feed it the latest source code and docs pages, would it be able to help you write code faster?

As others have mentioned, it seems down at the moment. But will be curious to give this a go and see how well it works.

This tool that allows you to add in a web scrape to a ChatGPT prompt is quite cool: https://twitter.com/DataChaz/status/1610556519531089921
Ooh, that is quite interesting. Thanks for sharing.
Is it broken?

I tried adding https://github.com/torvalds/linux and asking it to write a simple kernel module for me, but it gave me this error:

"Something went wrong, please try again later. If the problem persists, please contact us."

Edit: looks like it is, I see it's making a POST request to https://gptduck-production.up.railway.app/query which responds with a 500 Internal Server Error when I use a URL that looks like a GitHub repository (whether it actually exists or not), and "invalid repo" otherwise.

Heya - sign up for an account and put in your personal access token, it will work then! Hitting GitHub rate limits atm!
Interesting idea but backend is returning 500. Presumably killed by HN.
How did you implement this ? Would love to learn more
From what I can see, they download the repo to their server, generate per-function semantic embeddings (which will no doubt cost them at scale) [1], then use some combination of GPT-3 and Codex prompted with something like "Here's the source code of a git repo. Answer the following questions about it: {USER_QUESTION}."

[1] https://beta.openai.com/docs/guides/embeddings

Thanks for sharing this!

I found this guide/example from OpenAI which was pretty clear:

https://github.com/openai/openai-cookbook/blob/main/examples...

I tried but got ""Something went wrong, please try again later. If the problem persists, please contact us."
"Something went wrong, please try again later. If the problem persists, please contact us."
This is a good idea, I just wished it actually work. Only been getting errors since trying it out.

"'NoneType' object has no attribute 'similarity_search'. Try logging in and entering your own Github access token."

I'm getting invalid repository for my ideas repository. Does it process README.md?

I'm trying to ask ChatGPT what the core insight is to my repository

https://GitHub.com/samsquire/ideas4

it seems to have some issue. i asked about a repo (the hive / cortex) and it returned me:

It is difficult to answer this question without knowing more about the context provided. It appears to be a PGP public key block, but without additional information it is impossible to answer the question.

it seems it might not be able to read it?? (its got a normal readme and docs etc.)

I tried asking a simple how does ActionMailer work with the rails repository link and I'm receiving an error response.
Aww, it might have been HN-hugged. I was hoping it could answer a question I have about the ScummVM codebase. :/
This is awesome, thanks for sharing! Will you share the source so we can run this locally?
Is there something like this but for PDFs?