Hacker News new | ask | show | jobs
by alexpovel 518 days ago
Agree!

For fun, I had put together a GitHub bot for this purpose a while ago. It indexes all existing issues in a repo, creates embeddings and stores them in a vector DB. When a new issue is created, the bot comments with the 3 most similar, existing issues, from vector similarity.

In theory, that empowers users to close their own issues without maintainer intervention, provided an existing & solved issue covers their use case. In practice, the project never made it past PoC.

The mechanism works okay, but I've found available (cheap) embedding models to not be powerful enough. For GitHub, technology-wise, it should be easy to implement though.

https://github.com/alexpovel/issuedigger

3 comments

We made a similar thing too for our community discord where you can add an Emoji on a message and it will look for similar issues with a simple RAG. That saves us so much time when a user asks if a feature is planned or not. We also ask them to go upvote the issue or create one in the response.

Not open source right now but if people are interested I could clean up the code.

Microsoft seems to use a similar bot themselves, not sure how it is called or whether it is OSS: https://github.com/microsoft/winget-cli/issues/4765#issuecom...
Oh yeah, that looks super similar. I remember the similarity score being tricky to get useful signal out of, for the underlying model I had used back then. Similar and dissimilar issues all hovered around the 0.80 mark. But surely not hard to improve on, with larger models and possibly higher-dimension vectors.
If only Microsoft was interested in finding actual useful use-cases for their machine learning tech instead of constantly selling everyone on their chat bot...