Hacker News new | ask | show | jobs
Github search sucks (and how it could be better) (github.com)
61 points by RenaudWasTaken 3383 days ago
4 comments

I use the Mozilla DXR project (or the searchfox.org fork) every day, which is pretty great for code.

Not only can it quickly search across large codebases, it parses JS/Python and the output of clang (for C/C++) to allow quickly finding the definitions of functions, declarations of variables, and so on (try hovering over a variable for instance):

https://dxr.mozilla.org/mozilla-central/source/browser/compo... https://dxr.mozilla.org/mozilla-central/source/toolkit/mozap...

Nothing that many mainstream IDEs can't do, but having it on the web and being able to quickly link people and not requiring local setup helps tremendously to get people up to speed quickly.

I might just be searching for the wrong terms, but am I understanding that this is specific to searching in the mozilla codebase? Is there a version that can work on arbitrary codebases?
Can you use it with Github?
I would love to be able to search all code for a string and then either (1) sort the resulting repositories by stars/forks; or (2) limit the results to repositories with >X stars/forks. When learning a new framework or library I like to find popular projects that use it and read the code to get a sense of conventions, architecture, etc. For instance, it'd be fantastic to find all repositories with over 20 stars containing a *.py file with "import flask" or "from flask" in them.
all files ending with py with "import flask"

https://github.com/search?q=filename%3A%2A.py+%22import+flas...

use advanced search to specify stars:>20 etc.

Unfortunately, you can search code by file extension and phrase, and you can use advanced search to search for repository descriptions filtering by stars, but I don't believe you can do both at once.

For instance, searching for "flask" and limiting the results to >1000 stars returns only the 27 repositories with a matching description[0], but the code search returns over 4 million results, ignoring the stars parameter[1].

https://github.com/search?l=&q=flask+stars%3A%3E1000&ref=adv...

https://github.com/search?l=&q=flask+stars%3A%3E1000&ref=adv...

How would you build the search results UI for a grouped query like this? If one repository has 10k stars and has 1000 files with matching strings, should the first 1000 results be from the same repository?
The (poor) choice of title here is the difference between "Here are some things that would make my life easier" and "I'm an entitled prick".
+1
Just going for a catchy headline here, though I can understand your point of view :)
github should try to compile code. Where that succeeds, it will give them full type information for every variable, and information on every function call, just like a good IDE has when doing code completion.

With that info, they would be able to build an awesome search system.

That costs a LOT of money to do (infra is $$$) and you also have the issue of sandboxing code.

It would be hard and expensive with not much benefit to their bottom line.

Gitlab manages to do CI (partnering with DigitalOcean), which managed to get me at least partially switched, and has further potential for upsell (have more CI servers! with exotic configurations!)

Not all languages which could significantly benefit from pulling out type info even compile. Something significantly smaller scoped would be to have github's search aware of and consume some kind of intellisense-esque database or structured documentation format that any CI process could output. (Of course, someone needs to write the tools to generate said output in the first place...)