For people interested in searching code using an open source solution, you might be interested in Zoekt too, github.com/google/zoekt.
There is a demo site where you can search 30G of source code (including the Linux kernel, Android and Chrome) supporting regular expressions, and file name search:
Sourcegraph CEO here. Thanks for this post. We packed a lot of stuff into 2.4: faster, more powerful code search, Google Alerts-style search monitoring, diff searches, and more.
It’s now free on a single server for any number of users and repositories.
What benefits does it have over git grep, esp. if I'm using a monorepo? What new patterns/possibilities does it enable? Is it maybe speed somehow? (I assume it could then be more "live search/exploration"/rapid exploration than git grep - but OTOH wouldn't it require some slow reindexing after each change?)
For code search on a monorepo, Sourcegraph is often faster in UX and execution time for a lot of tasks. It's easier/faster to filter the results than `git grep`, you can see more on your screen, it's easier to jump to the full file, it's easier to see blame info for particular lines, etc.
Sometimes while coding you just need to find where something is so you can edit it or jump to it. In that case, your editor's search or `git grep` is definitely better. But when you're looking for example code, reviewing/reading code, or debugging code, it's often better to do it in a UI that's more optimized for those tasks than `git grep` and your editor.
And then Sourcegraph also has code intelligence, code host browser extension integrations, saved queries, etc., beyond the basic code search.
BTW, Sourcegraph doesn't use an index for search. We heavily optimized the performance of searching an arbitrary revision that has never been indexed. So no slow reindexing after each change.
Just tried this out on some of our code bases. It appears to fail to generate snippets/highlights for all files that contain non-Unicode text, e.g. "Müller" in ISO-8859-1. Known issue?
Many queries times out for me although I'm running it on a pretty beefy AWS c5 instance with SSDs. Queries such as "type:diff" doesn't seem to work at all on my code bases. It also does not appear to cache any data from previous runs of "git log", so attempting to do the suggested reload doesn't really workaroudn the issue. Are you working on improving the performance?
We're actively working on improving the performance of diff search, but I would expect other types of queries to complete quickly. Would you mind sharing more about the size / characteristics of your repositories? Feel free to email me at beyang@sourcegraph.com if a private channel is better.
Three examples: 300MB size, 600 branches, 25k commits. 250MB, 250 branches, 15k commits. 80MB, 100 branches, 4k commits. Textwise it is a mix of Golang, JS and Python. Most of the repo size comes from binary resources (images etc).
Email and other kinds of notifications for saved queries are coming in the next release in early Feb. Email me (sqs@sourcegraph.com) if you'd like to preview them sooner. I agree they are crucial for this feature to feel truly complete and awesome.
The homepage does show a nice sparkline and results summary, though. Easy to see at a glance if new secret keys, deps, etc., are added to your repositories if you set up the queries.
Analytics lets you see statistics about how your own server's are using it (each user's total count of pageviews and searches).
Telemetry lets you see the telemetry data it sends to Sourcegraph (which you can disable in the site config and never contains code/paths/repo names or anything derived from them).
Installed it, configured it, pointed it at the public cpython repo, it said 'cloning' and then pegged cpu for about 15 minutes (with about a third to a half of the usage in 'system') and eventually kernel-panicked.
OS 10.13.3 (17D34a) and Docker 17.12.0-ce-mac46 (21698). I don't think it's you, Docker is just still a bit flaky there, which is why I'm asking. I'll try it with something smaller.
We'll definitely consider it. What benefits would you get from having non-Docker? (Not saying there aren't any, just curious what your biggest needs are.)
Re: PostgreSQL configuration, is it that you want to be able to manage and back up the data yourself (not using the Docker container's internal PostgreSQL), or is it a tuning/performance concern, or something else?
unrelated question to the server but related to Sourcegraph: Why did you guys switch away from the VS code style editor on the web to an uneditable one? I loved using it.
Our security page is at https://about.sourcegraph.com/security. We have a security assessment that we can share with customers, but not one that we post publicly yet.
We have customers who run Sourcegraph on machines that are completely blocked off from the Internet and only have access to the specific IP ranges of their code hosts on the same network. You can set it up like that if you'd like, which would significantly reduce the risks without needing to trust any third parties (us or the security reviewer).
Sourcegraph only supports Git natively, not Mercurial. You could use Sourcegraph with Git mirrors of your Mercurial repository, if that is appealing, and we'd definitely consider adding in some extra translation work so that the Mercurial metadata embedded in the Git mirror repository would be respected. Does your code host (or do you internally) already have a Git mirror of your repository?
Yes. I was wrong in my previous answer - after sqs's response, I looked them up on linkedin, they have at least a german developer in Berlin.
(FWIW - and I'm only saying this in case you were in a similar situation: I applied to them for a job not because I was looking for one, but because I accidentally saw it on HN and the match between my skills and their apparent need was simply "too good to be true" territory. I wasn't necessarily expecting an offer, but I expected to talk to someone - was curious to learn more about what they're doing. However, I got rejected straight away - so I just assumed that they said "REMOTE" for the heck of it... I know it sounds arrogant, but I have a hell of a hard time believing their other applications outclass me so obviously that it was not even worth talking to me, so I assumed it must be something else)
Wow, that's a pretty awful UI. The default search form screams "advanced search" from the late 90ies. The compare example looks pretty dated too. I think Kibana and Sourcegraph are on the right track with a single input field that accepts field:value type searches. They're great once you've learned them.
Opengrok implements hyperlinking for C/C++ code which is the primary productivity multiplier since it allows you to easily jump around callgraphs. That functionality is sorely missing here (unless i am missing this feature somehow).
Yes, Sourcegraph supports GitLab repositories! Check out https://about.sourcegraph.com/docs/server/config/repositorie... and the section right below for auth. You'll need to add and authenticate them one-by-one in the config. Soon we'll be add direct GitLab integration like we have for GitHub and GitHub Enterprise, which will sync all (or selected) repositories using the GitLab API.
The source code is not public for this version. I think that source-available but non-open-source licenses are an idea ahead of their time when applied to user-facing software like Sourcegraph. I hope that changes, and we'd love to make Sourcegraph source-available again, but it actually introduced (rather than eliminated) questions in the process of companies adopting Sourcegraph. I'll probably blog about this soon because it's something I care about a lot.
> I'll probably blog about this soon because it's something I care about a lot.
Yes, please! I have seen your videos and read a lot about your thoughts on this subject.
Another question, if you don't mind: does sourcegraph have a forum or irc/slack? A quick search for sourcegraph+slack ends up finding many hits.. for your name, lol.
Cool! We don't have a public Slack/IRC yet, but it seems like something we might do in the future. In the meantime, we're all pretty responsive on Twitter and on email.
And yeah, my last name being Slack does create some confusing moments sometimes. :)
1. There doesn't appear to be any relevancy sorting. It appears only the exact term is returned. If it’s not exact, I am not sure how to control whether or not it looks for an exact match and/or what strategy it uses to fuzzy match. Does it tokenize? Use some kind of levenstein distance algorithm?
2. The query results are hugely wasteful in terms of screen space. This means searching for a minorly common term in a large codebase is prohibitively time consuming compared to cloning + ripgrep or whatever.
3. There's no way to search file names + file content. It took me 7 years after github's creation to realize you could search for filenames if you press 't' on the repository.
4. No regex or globbing support, to my knowledge.
This is before listing all the tooling (like sourcegraph) I would hope would be built into a source code repository to assist browsing but are strangely missing--every IDE and editor out there is much faster at casually browsing code because navigation is so much cheaper and frictionless.
I mean overall it's not broken, it's just way less useful for searching a tree of code files than find/xargs/grep is, let alone ack/thesilversearcher/ripgrep. If the capabilities I'm describing are there, they're well-hidden. Github just isn't a good place to browse code.
1. I never really noticed that because I mainly use Sourcegraph's code-intelligence on open source projects and as a result search is something that I have to rarely rely on.
2. You can stylize any page using something like stylebot or a homebrew browser extension.
3. Although not something that I do often, I find the filename search on google (for OSS projects) quite accurate and then the chrome extension allows you to open that file on sourcegraph.com or inject code intelligence within the github page as well.
2) isn’t about style, it’s about the fact that the results are paginated. You end up needing to search the damn search results, which is super slow when you’re paginated and it could have been a screen scan if you could fill the screen with results and scroll rapidly through the rest.
this is just honest feedback, not a value judgement.
I don’t think the person you replied to is complaining about Sourcegraph. He is talking about how he dislikes GitHub code search, and it seems that he likes Sourcegraph.
So, you end up having to clone all repos locally and grep for the word.
Just pointing out that limitation of search in GitHub, not saying that this other tool is actually reliable to do this kind of things (I haven't used it before)
To compare GitHub to Sourcegraph search, here is that same query on Sourcegraph.com (which is Sourcegraph Server running for all open-source code on GitHub):
If your needs are met by GitHub's search, then I would still suggest using the Sourcegraph Chrome extension (also available for Firefox), which adds code intelligence to code you view on GitHub: https://chrome.google.com/webstore/detail/sourcegraph-for-gi....
Did you get permission from SourceGraph to post this comment to HN?
> You may not release the results of any performance or functional evaluation of any of the Software to any third party without prior written approval of Sourcegraph for each such release.
We just removed that clause (also replied to your other comment about it). Didn’t intend for it to be in there; I agree it’s silly. Thanks for pointing it out.
I don't really get the pricing on code intelligence. So if I have 50 users and want Javascript, Python, and PHP, that's $750/month, even if 25 users only ever use Python?
If you want 3 or more languages, then contact us (at https://about.sourcegraph.com/pricing) and we can give you a package discount. Overall, if pricing is a concern, I'd love to learn more. I'll email you.
I note with some distaste that it includes an Oracle-esque prohibition on benchmarking.
> You may not release the results of any performance or functional evaluation of any of the Software to any third party without prior written approval of Sourcegraph for each such release.
Sourcegraph CEO here. That shouldn’t be in there, I agree—we meant to remove that section. It will be removed in a couple of minutes. Please try it, use it, and post lots of evaluations about our product. :)
There is a demo site where you can search 30G of source code (including the Linux kernel, Android and Chrome) supporting regular expressions, and file name search:
https://cs.bazel.build/?q=%20
For example, https://cs.bazel.build/search?q=+r%3Atorvalds+craz%5Byi%5D&n... looks for craz[iy] across the Linux kernel.