| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wizeman 1266 days ago

This appears to be quite nice assuming it works, but I wonder: how does this work?

I thought ChatGPT could only process a very limited context (a few thousand words or so) and this claims to work for repositories <100 MB, which is gigantic in comparison.

The site says: "we will download [the repo] to the server and create embeddings against the code". What kind of embeddings are these?

2 comments

penguinten 1266 days ago

It uses the OpenAI embedding models to get around the context window problem. You can compress large amounts of text/code into just a few tokens.

link

LunaticDragon 1265 days ago

Can you elaborate on this? What is the input to the GPT in that case? I was under the impression that GPT is given an array of tokens and it produces one token as its output.

link

iamflimflam1 1265 days ago

Would love some more information on how to do this. Can you share details or point to a resource?

link

cipherpro 1265 days ago

https://beta.openai.com/docs/guides/embeddings/use-cases

link

joshka 1265 days ago

I'm curious if can share LoC -> token count metrics?

link

realYitzi 1265 days ago

I don't know about embeddings, but it was able to decode a URL encoded string into a javascript animated SVG. So larger texts can be encoded and still "understood." You probably wouldn't even have to declare your encoding method. https://www.tiktok.com/@y.i.t.z.i/video/7184820436839042306

link