Hacker News new | ask | show | jobs
by hamandcheese 519 days ago
A GitHub feature I think would be really handy is suggesting duplicate issues when writing up a new issues. Many projects ask that you search for already reported tickets, but GitHub's current search isn't great if you aren't sure what you are looking for.
9 comments

Agree!

For fun, I had put together a GitHub bot for this purpose a while ago. It indexes all existing issues in a repo, creates embeddings and stores them in a vector DB. When a new issue is created, the bot comments with the 3 most similar, existing issues, from vector similarity.

In theory, that empowers users to close their own issues without maintainer intervention, provided an existing & solved issue covers their use case. In practice, the project never made it past PoC.

The mechanism works okay, but I've found available (cheap) embedding models to not be powerful enough. For GitHub, technology-wise, it should be easy to implement though.

https://github.com/alexpovel/issuedigger

We made a similar thing too for our community discord where you can add an Emoji on a message and it will look for similar issues with a simple RAG. That saves us so much time when a user asks if a feature is planned or not. We also ask them to go upvote the issue or create one in the response.

Not open source right now but if people are interested I could clean up the code.

Microsoft seems to use a similar bot themselves, not sure how it is called or whether it is OSS: https://github.com/microsoft/winget-cli/issues/4765#issuecom...
Oh yeah, that looks super similar. I remember the similarity score being tricky to get useful signal out of, for the underlying model I had used back then. Similar and dissimilar issues all hovered around the 0.80 mark. But surely not hard to improve on, with larger models and possibly higher-dimension vectors.
If only Microsoft was interested in finding actual useful use-cases for their machine learning tech instead of constantly selling everyone on their chat bot...
If we're talking issues (i.e. reports from external parties, like OSS users, and not internally defined tasks), then care is needed to avoid it working out like the Stack Overflow experience. What is it, you ask?

[Closed; Locked: not constructive; Duplicate of: #1701, #74656]

Users will fight such things for a simple reason: every larger OSS project has tons of open[0] issues that look like duplicates, and perhaps even are duplicates, but no one can tell because they're all ~forever old and unresolved, so new people keep adding them again to bring attention to them.

Perhaps Github should start sorting issues by "Last updated" instead of the issue number - then some of the duplicate reports would just turn into "+1"s and "bump!"s on the existing issues.

--

[0] - Or closed by stale bot, which is effectively still open, but with an insult on top.

I refuse to work with projects with a stale bot. As if ignoring an issue will just magically resolve it. I also refuse to use products with a stale bot once I discover it is used; they are usually bug ridden due to uncovered issues being ignored.
Completely agree with this suggestion.

Ive often wondered why GitHub hasn’t introduced this feature because it feels like a really obvious thing to introduce and something that would add an immense amount of value to a lot of projects.

Cynical answer: because having users writeup duplicate issues and then having maintainers close them is more engagement than warding off unnecessary toil. Gotta keep those metrics going up and to the right.
GitHub doesn’t have ads and makes its money off of enterprise subscriptions (and Copilot), so I don’t think “engagement” is a very important metric for them.
To the company, no. But to people trying to get a promotion/bigger budgets by proving the features they work on are getting a lot of usage, plausible.
> To the company, no.

Why not? Companies love to boast about MAUs and similar metrics (even if completely bogus), it has good effect on stock prices.

Those are very general stats. They won't drop just because they disincentivize abuse of issues or pull requests.
I suspect it's more to do with issue management isn't their core product so doesn't get the same attention an issue management system would give it.
Wait if it's not their core product, what is? GitHub is, at its core, a file/history browser + issue management system + merge request system built around Git. There's not that much to it other than issue management.
It's core product is git hosting, you use it to host your git repositories. You use features such as Pull Requests to power how you merge within your git repositories. If the issue system isn't working it's not a big deal, but if we can't use git it's a massive deal. It's all in the name GIThub

Most companies don't use GitHub's issue management system they use issue management tools such as JIRA, Trello, etc. Issue management, project management, CI/Actions, wiki, discussions, etc are all nice to haves and are probably more aimed at the open source projects that are used as a marketing tool.

Most open source projects (you know, the thing GitHub claims to exist for) do pretty much exclusively use GitHub issues for issue tracking though. GitHub makes it pretty difficult to be on GitHub and not accept GitHub issues.
Meta and Google have this in their internal systems.
Good suggestion! Sounds similar to what stack overflow does when asking a question.
I was also having some frustration in navigating GitHub issues.

So I wrote a simple app for fun to navigate and search GitHub issues like emails and even reply

Screen recoding https://x.com/justruky/status/1878507719520387347

It definitely had something like this at least in beta within the last couple years, or maybe just based on the title.

But you’re completely right, GH search is truly bad

Ideally there should even be an API for it so we can use it in other systems like slack/discord bots when people suggest improvements.
Linear (linear.app) does this FWIW build on vector search, we're actively working on making it more accurate too