Hacker News new | ask | show | jobs
by orsenthil 3088 days ago
Why should we use this instead of the github search (for our private repos) ?
4 comments

I mean honestly if you are happy with github search you should stay where you are.

Personally, I find it to be absolutely horrendous. It’s so much faster to clone the repo and search it locally.

> Personally, I find it to be absolutely horrendous.

Why is that so?

These are the reasons of which I am conscious:

1. There doesn't appear to be any relevancy sorting. It appears only the exact term is returned. If it’s not exact, I am not sure how to control whether or not it looks for an exact match and/or what strategy it uses to fuzzy match. Does it tokenize? Use some kind of levenstein distance algorithm?

2. The query results are hugely wasteful in terms of screen space. This means searching for a minorly common term in a large codebase is prohibitively time consuming compared to cloning + ripgrep or whatever.

3. There's no way to search file names + file content. It took me 7 years after github's creation to realize you could search for filenames if you press 't' on the repository.

4. No regex or globbing support, to my knowledge.

This is before listing all the tooling (like sourcegraph) I would hope would be built into a source code repository to assist browsing but are strangely missing--every IDE and editor out there is much faster at casually browsing code because navigation is so much cheaper and frictionless.

I mean overall it's not broken, it's just way less useful for searching a tree of code files than find/xargs/grep is, let alone ack/thesilversearcher/ripgrep. If the capabilities I'm describing are there, they're well-hidden. Github just isn't a good place to browse code.

1. I never really noticed that because I mainly use Sourcegraph's code-intelligence on open source projects and as a result search is something that I have to rarely rely on.

2. You can stylize any page using something like stylebot or a homebrew browser extension.

3. Although not something that I do often, I find the filename search on google (for OSS projects) quite accurate and then the chrome extension allows you to open that file on sourcegraph.com or inject code intelligence within the github page as well.

4. Github sort of supports regex like search, you can learn more @ https://help.github.com/articles/searching-code/

2) isn’t about style, it’s about the fact that the results are paginated. You end up needing to search the damn search results, which is super slow when you’re paginated and it could have been a screen scan if you could fill the screen with results and scroll rapidly through the rest.

this is just honest feedback, not a value judgement.

I don’t think the person you replied to is complaining about Sourcegraph. He is talking about how he dislikes GitHub code search, and it seems that he likes Sourcegraph.
I dislike github search too, especially the pagination part but the issues GP mentioned can be avoided by using some google-foo and Sourcegraph for Chrome :)
GitHub search is pretty useless if you want to rely on it to find all uses of a given word, say during a refactorization

https://stackoverflow.com/questions/43891605/search-partial-...

So, you end up having to clone all repos locally and grep for the word.

Just pointing out that limitation of search in GitHub, not saying that this other tool is actually reliable to do this kind of things (I haven't used it before)

Sourcegraph CEO here.

To compare GitHub to Sourcegraph search, here is that same query on Sourcegraph.com (which is Sourcegraph Server running for all open-source code on GitHub):

https://sourcegraph.com/search?q=repo:Kurento/+DISABLE_LIBRA...

It works as expected (and as the SO poster wanted)! It shows desired results that GitHub search does not.

Regexps are also supported...give it a try!

I'm the poster at SO :) Very cool! I'll definitely give it a try
can you only search a single repo at a time ?

does it parse the code and store the AST, or is it just plain text ?

It searches multiple repositories at a time. That query above searches all repositories in the given organization.

It does have code intelligence (parsing, semantic references and go-to-def, etc.) but that search is just a text search.

Our users prefer Sourcegraph over GitHub for code search for multiple reasons:

- Regular expression searches

- Exact searches (no ignoring punctuation, for example)

- Searches on any commit or branch, not just recently indexed master

- Diff searches (see https://about.sourcegraph.com/blog/introducing-sourcegraph-s...)

- Overall faster, more powerful searches and filtering capabilities

- Code intelligence (go-to-definition, find-references, hovers, etc.)

Not everyone needs these things. But users who do need them say that they save a lot of time and make them more productive.

At Google, for example, they have a similarly advanced internal code search system that developers love (see https://static.googleusercontent.com/media/research.google.c... and https://docs.google.com/document/d/1LQxLk4E3lrb3fIsVKlANu_pU... for research/numbers).

If your needs are met by GitHub's search, then I would still suggest using the Sourcegraph Chrome extension (also available for Firefox), which adds code intelligence to code you view on GitHub: https://chrome.google.com/webstore/detail/sourcegraph-for-gi....

Did you get permission from SourceGraph to post this comment to HN?

> You may not release the results of any performance or functional evaluation of any of the Software to any third party without prior written approval of Sourcegraph for each such release.

-- https://about.sourcegraph.com/terms/

We just removed that clause (also replied to your other comment about it). Didn’t intend for it to be in there; I agree it’s silly. Thanks for pointing it out.
Google also has this:

https://source.bazel.build

That's not an open source product and not available otherwise, though it is backed by kythe.
If you want to have an open source index search, you can try out github.com/google/zoekt/ . See here for a demo site: https://cs.bazel.build/

For example: https://cs.bazel.build/search?q=r%3Atorvalds+meltdown&num=50 searches the Linux kernel for "meltdown"

Is code intelligence a paid upgrade for all languages?
Yes, code intelligence (go-to-definition, find references, hovers, etc.) on Sourcegraph Server is a paid upgrade for all languages.

But you can try/use it for free on open-source projects using our Chrome extension (to get it on code you view on GitHub) at https://chrome.google.com/webstore/detail/sourcegraph-for-gi... or on our public site directly at https://sourcegraph.com/github.com/gorilla/websocket/-/blob/... (for example).

Chrome extension is marvellous for anyone who hasn't used it. Particularly useful for looking up docs of calls to external packages in Python repos.
I don't really get the pricing on code intelligence. So if I have 50 users and want Javascript, Python, and PHP, that's $750/month, even if 25 users only ever use Python?
If you want 3 or more languages, then contact us (at https://about.sourcegraph.com/pricing) and we can give you a package discount. Overall, if pricing is a concern, I'd love to learn more. I'll email you.
I think it really boils down to a perception issue. I think more people would be happy paying a flat fee for a language package then deal with the angst of "wasted" money paying for languages that some users don't leverage. The package as it currently stands would "work" if I could assign languages to users and have that reflect back up to pricing, but that's cumbersome to manage from a customer perspective and sounds pretty painful to implement from the provider perspective.

It may be useful to consider a baseline two-language deal, as Javascript + one server side language covers a huge amount of use cases.

That said, you cover 2 out of 3 of the following scenarios pretty well with the existing model, I just happen to fall into the third, which is probably the smallest sector for you guys anyway:

1 - Small startups, probably standardized on one or two languages, 8-10 people 2 - Larger orgs (200+) where the cost is negligible compared to revenue. 3 - Medium-sized, microservice/squad based orgs, with heterogenous language support but focused within teams.

GitHub search is quite useless. Results are incomplete in unpredictable ways. I've had good results with etsy/houndd.