Show HN: GPTDuck – Ask questions about any GitHub repo | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Show HN: GPTDuck – Ask questions about any GitHub repo (gptduck.com)
	128 points by penguinten 1265 days ago

21 comments

msla 1265 days ago

I thought I'd throw it a curveball:

https://github.com/chbarts/particle-abcs

And my question:

> What language is this written in?

The answer:

> This appears to be a poem written in English. It is written in a form that is often used for teaching purposes, in which each letter of the alphabet is associated with a concept related to a particular field of study. In this case, the poem is about quantum field theory and particle physics. The poem is written in a rhyming couplet form and contains many scientific terms that are specific to the field.

Which is entirely correct. I'm impressed.

sandkoan 1265 days ago

How curious: I have a small children's book I picked up at the California Academy of Sciences entitled, "The ABCs of particle physics," made by a joint collaboration between Fermilab and SLAC [1], and here I find another poem in the same vein!

[1] https://www.symmetrymagazine.org/article/the-abcs-of-particl...

czbond 1265 days ago

Wow. I googled part of your poem to see if maybe it had been on the web previously and GPT was regurgitating text around it (eg: analysis, comments).

I am very impressed with your test - and the result.

saghm 1265 days ago

I imagine that it might already know how to answer that question for non-Github related texts, which presumably are often not programs. I'd be more interested to see what it responds to "What programming language is this written in?" to see if it will question the premise of the question (although there's always that it might make an educated guess that the poem is valid Perl!)

jfernandez 1264 days ago

The fact that it also gave the backstory on this being a known artistic tactic, truly awesome

penguinten 1265 days ago

Hey guys - creator here. Have hit the Github rate limit and currently not working! Am trying to fix! Thanks for all the support!

freeqaz 1265 days ago

Might be worth asking users to "login with GitHub" to get past the rate limits (the anonymous rate limits are lowwww like 10 per hour).

Logging with GitHub is still lowish but like 1k per hour or so. (IIRC)

penguinten 1265 days ago

I didn't expect this many people to try it out haha - with hindsight, absolutely should have

prepend 1265 days ago

There’s no way I’m logging into rando apps using my GitHub id. Aside from their bad UX and the potential to accidentally over grant permissions, the app could abuse GitHub ToS with my credentials and get me banned.

Not worth the risk just to play around with a site.

sebzim4500 1265 days ago

Funny that github is the bottleneck rather than the OpenAI for once.

BugsJustFindMe 1265 days ago

Do the request from the browser and that won't happen because the request source will be distributed.

penguinten 1265 days ago

Need to download the files to the server to generate the embeddings!

penguinten 1265 days ago

Update: just shipped a fix for the rate limits - you will need to signup for an account and put your own personal access token in.

int_19h 1265 days ago

The sign-up form doesn't seem to allow passwords with non-alphanumeric characters in them. Ironically, when it sees one, it complains that "password must have at least one number and at least one letter" (even though it already does) - I suspect that's just a catch-all message for invalid passwords?

In any case, the restriction makes it incompatible with pretty much all password generators, including the built-in ones in modern browsers. To improve matters, I would suggest dropping all restrictions on what cannot be included altogether.

penguinten 1265 days ago

Had a broken validator - fixed!

moffkalast 1265 days ago

shakes fist at Microsoft

moneywoes 1265 days ago

How are you avoiding hitting the rate limit for open ai apis

wizeman 1265 days ago

This appears to be quite nice assuming it works, but I wonder: how does this work?

I thought ChatGPT could only process a very limited context (a few thousand words or so) and this claims to work for repositories <100 MB, which is gigantic in comparison.

The site says: "we will download [the repo] to the server and create embeddings against the code". What kind of embeddings are these?

penguinten 1265 days ago

It uses the OpenAI embedding models to get around the context window problem. You can compress large amounts of text/code into just a few tokens.

LunaticDragon 1265 days ago

Can you elaborate on this? What is the input to the GPT in that case? I was under the impression that GPT is given an array of tokens and it produces one token as its output.

iamflimflam1 1265 days ago

Would love some more information on how to do this. Can you share details or point to a resource?

cipherpro 1265 days ago

https://beta.openai.com/docs/guides/embeddings/use-cases

joshka 1265 days ago

I'm curious if can share LoC -> token count metrics?

realYitzi 1265 days ago

I don't know about embeddings, but it was able to decode a URL encoded string into a javascript animated SVG. So larger texts can be encoded and still "understood." You probably wouldn't even have to declare your encoding method. https://www.tiktok.com/@y.i.t.z.i/video/7184820436839042306

iforgotpassword 1265 days ago

"Something went wrong, please try again later. If the problem persists, please contact us."

Tried several times.

syx 1265 days ago

Most likely HN kiss of death, their backend [1] returned 500

[1] https://gptduck-production.up.railway.app/query

iforgotpassword 1265 days ago

Was kinda expecting this but then again it was still only under "new" with 3 votes, so I had doubts that it was maybe just me.

MoOmer 1265 days ago

Just came here to post this also!

JLCarveth 1265 days ago

I tried it for my (incomplete) compiler project[0], and the results are quite funny:

> This program prints a large string of text to the console. It does this by using a String variable called myLongString, which is initialized with a long string of text. The program begins with the line "program LargeString {" and ends with the line "}". Inside the program, there is a comment that reads "Now let's try a comment". This comment is not necessary for the program to work, but it helps to explain what the program does. The program then uses the write() function, which is a function that prints a given string to the console, to print the large string of text.

It seems as though its looking at the sample code files used for testing rather than the actual compiler codebase.

[0] - https://github.com/JLCarveth/zpl-compiler

sebzim4500 1265 days ago

I'm impressed you got it to do anything. Did it work first time or did you keep clicking 'Submit' until you got a response?

calebegg 1265 days ago

It worked for the code for my personal website at ca.lebe.gg

https://github.com/calebegg/website

It's very impressive!

david2ndaccount 1265 days ago

When it works it’s impressive, but sometimes it just makes things up:

For one of my repos: https://github.com/drpriver/macwatch

> Does this program work on windows?

No, macwatch is a program designed specifically for Mac computers and is not compatible with Windows systems. The program is designed to provide users with a comprehensive overview of their Mac's performance, including CPU usage, memory usage, disk usage, and more. It can also be used to monitor network activity and alert users when a process is consuming too much system resources. Additionally, the code provided is written in Objective-C and C, which are not supported on Windows systems.

---

The first sentence is correct, the rest is just completely wrong.

shp0ngle 1265 days ago

This is in general problem with gpt models.

When it works, it’s flawless. When it doesn’t work, it just makes stuff up that kind of look good, just to make you happy. And there is really no way to tell from the output…

TillE 1265 days ago

Pretty funny how it seems to just have invented something based on nothing more than the name macwatch. The last sentence is remarkably strange, like it's just riffing on the idea of "not supported on Windows" and decided to throw in some random facts about the repo, and wound up with a totally false statement.

rahuldan 1263 days ago

Great project!!

I was building something on similar lines. It is a semantic search engine for codebases using Openai's embedding and FAISS https://github.com/rahuldan/codesearch

It fetches the functions in the code for a query

dinkleberg 1265 days ago

I've been thinking about something similar. GPT-3 has dated knowledge on a lot of libraries. If you could feed it the latest source code and docs pages, would it be able to help you write code faster?

As others have mentioned, it seems down at the moment. But will be curious to give this a go and see how well it works.

brianjking 1265 days ago

This tool that allows you to add in a web scrape to a ChatGPT prompt is quite cool: https://twitter.com/DataChaz/status/1610556519531089921

dinkleberg 1265 days ago

Ooh, that is quite interesting. Thanks for sharing.

feklest 1265 days ago

Is it broken?

I tried adding https://github.com/torvalds/linux and asking it to write a simple kernel module for me, but it gave me this error:

"Something went wrong, please try again later. If the problem persists, please contact us."

Edit: looks like it is, I see it's making a POST request to https://gptduck-production.up.railway.app/query which responds with a 500 Internal Server Error when I use a URL that looks like a GitHub repository (whether it actually exists or not), and "invalid repo" otherwise.

penguinten 1265 days ago

Heya - sign up for an account and put in your personal access token, it will work then! Hitting GitHub rate limits atm!

sebzim4500 1265 days ago

Interesting idea but backend is returning 500. Presumably killed by HN.

mrg3_2013 1265 days ago

How did you implement this ? Would love to learn more

sandkoan 1265 days ago

From what I can see, they download the repo to their server, generate per-function semantic embeddings (which will no doubt cost them at scale) [1], then use some combination of GPT-3 and Codex prompted with something like "Here's the source code of a git repo. Answer the following questions about it: {USER_QUESTION}."

[1] https://beta.openai.com/docs/guides/embeddings

gregsadetsky 1265 days ago

Thanks for sharing this!

I found this guide/example from OpenAI which was pretty clear:

https://github.com/openai/openai-cookbook/blob/main/examples...

mrg3_2013 1265 days ago

I tried but got ""Something went wrong, please try again later. If the problem persists, please contact us."

replwoacause 1265 days ago

"Something went wrong, please try again later. If the problem persists, please contact us."

replwoacause 1264 days ago

This is a good idea, I just wished it actually work. Only been getting errors since trying it out.

"'NoneType' object has no attribute 'similarity_search'. Try logging in and entering your own Github access token."

samsquire 1265 days ago

I'm getting invalid repository for my ideas repository. Does it process README.md?

I'm trying to ask ChatGPT what the core insight is to my repository

https://GitHub.com/samsquire/ideas4

sim7c00 1265 days ago

it seems to have some issue. i asked about a repo (the hive / cortex) and it returned me:

It is difficult to answer this question without knowing more about the context provided. It appears to be a PGP public key block, but without additional information it is impossible to answer the question.

it seems it might not be able to read it?? (its got a normal readme and docs etc.)

jcolella 1265 days ago

I tried asking a simple how does ActionMailer work with the rails repository link and I'm receiving an error response.

ravenstine 1265 days ago

Aww, it might have been HN-hugged. I was hoping it could answer a question I have about the ScummVM codebase. :/

iraliaf 1265 days ago

This is awesome, thanks for sharing! Will you share the source so we can run this locally?

raybb 1265 days ago

Is there something like this but for PDFs?