Hacker News new | ask | show | jobs
by wizeman 1266 days ago
This appears to be quite nice assuming it works, but I wonder: how does this work?

I thought ChatGPT could only process a very limited context (a few thousand words or so) and this claims to work for repositories <100 MB, which is gigantic in comparison.

The site says: "we will download [the repo] to the server and create embeddings against the code". What kind of embeddings are these?

2 comments

It uses the OpenAI embedding models to get around the context window problem. You can compress large amounts of text/code into just a few tokens.
Can you elaborate on this? What is the input to the GPT in that case? I was under the impression that GPT is given an array of tokens and it produces one token as its output.
Would love some more information on how to do this. Can you share details or point to a resource?
I'm curious if can share LoC -> token count metrics?
I don't know about embeddings, but it was able to decode a URL encoded string into a javascript animated SVG. So larger texts can be encoded and still "understood." You probably wouldn't even have to declare your encoding method. https://www.tiktok.com/@y.i.t.z.i/video/7184820436839042306