Hacker News new | ask | show | jobs
by jarulraj 997 days ago
Neat AI app!

1. What feature extractor is used to derive code embeddings?

2. Would support for more complex queries be useful inside the app?

   --- Retrieve a subset of code snippets
   SELECT name 
   FROM snippets
   WHERE file_name LIKE "%py" AND author_name LIKE "John%"
   ORDER BY
      Similarity(
         CodeFeatureExtractor(Open(query)),
         CodeFeatureExtractor(data)
      )
   LIMIT 5;
1 comments

embeddings are done using ChromaDB

support for more complex queries could be useful, but probably not using a query language since that would make it more difficult to use free-form text input.

You can already use it using an API: https://kantord.github.io/SeaGOAT/0.27.x/server/#understandi... so probably the best way to add support for more complex queries would be to have additional query parameters, and also to expose those flags/options/features through the CLI

For those curious about it, ChromaDB uses all-MiniLM-L6-v2[0] from Sentence Transformers[1] by default.

[0] https://docs.trychroma.com/embeddings#default-all-minilm-l6-...

[1] https://www.sbert.net/docs/pretrained_models.html

btw I am also working on a web version of it that will allow you to search in multiple repositories at the same time and you will be able to self host it at work, or run it locally in your machine. https://github.com/kantord/SeaGOAT-web

so that could provide a nicer interactive experience for more complex queries

It’d be cool if it acted against Github repos, then you can save the embeddings and have a unified interface for querying repos.

I had this problem trying to learn a library and figuring out what all the functionalities are. I ended up making a non-ai solution (an emacs pkg), but this seems just a step or two away from your current project imho.