These ideological decisions don't sound very pragmatic. There's a lot of open-source prior art in this space (OpenGrok, Kythe, SourceGraph) which provide support for most large languages and have annotation output formats that are broadly similar to this JSON file, and you could still support users having indexers for small languages running as part of CI.
> There does not exist any widely available standalone C parsing library to provide C programs with access to an AST. There’s LLVM, but I have a deeply held belief that programming language compiler and introspection tooling should be implemented in the language itself. So, I set about to write a C parser from scratch.
Even if you prefer to write your C indexer in C, you could use LLVM's C [1] or Python [2] APIs. Plus, you can handle C++ without having to implement your own C++ parser from scratch, which is a much larger undertaking than C99 plus a few GNU extensions.
One problem with OpenGrok et al is scale. I already have a service which is designed to run arbitrary user tasks in an environment configured for their project's needs, so I wanted something that could take advantage of that.
As for parsing C++, since LLVM is written in C++ using it to write a C++ annotator would be a natural fit :) But C and C++ are different langauges and I don't wish to require LLVM to deal with it. LLVM is one of the largest open source projects on the net, and it requires a lot more complexity and compile time to utilize under these circumstances. On the other hand, I came up with a solution which is <1,300 lines of code and won't grow much more as it expands to support a broader set of C extensions.
There does exist prior art, but I deliberately chose to go with the lowest common denomoniator to provide support for a lot of use-cases we can't predict in an environment which gives users more control over its behavior. I think over time it will be pretty easy to plug the prior art into this system, but harder to plug their systems into novel use-cases. The existing solutions are not always the best, but I did put in a lot of research time to validate that assumption.
Github also recently open-sourced their Haskell-based Semantic, which annotates and cross-references a whole bunch of languages (all the languages any of our clients use), and is built on tree-sitter, so there's, like, several levels of prior art available here.
Another issue with Semantic that makes me less thrilled about using it here: say it doesn't support programming langauge $x, and you don't know or want to use Haskell, but you do know and want to use language $x. To add it to GitHub, you have to learn Haskell, which is no small mountain to climb. To add it to SourceHut, you can just leverage $x's existing tools.
But, plugging Semantic into SourceHut should be totally possible with some mild massaging of the output JSON.
Semantic is in Haskell, but since it doesn't use GHC, it cannot handle Haskell well (can't e.g. resolve type classes).
If I wanted good code reviewing with Haskell, I figure it would be best to translate HIE files (https://www.haskell.org/ghc/blog/20190626-HIEFiles.html) to LSIF (https://github.com/mpickering/hie-lsif), which is supported in VSCode. Because of the limitations of only parsing, GitHub alone will not be as powerful. If I then just make an LSIF to SourceHut converter, SourceHut will have better annotations than GitHub...
On SourceHut it's less language-aware and more generic, which makes it more useful for a wider range of use-cases. However, I could totally see a tool being possible which converts LSIF files into SourceHut annotations.
For Python dig up the old PySonar project (the author took it down for some reason but there are mirrors/archives where you can find versions. Oh hey, it looks like it has been resurrected: https://github.com/yinwang0/pysonar2) It might have been superseded by something else in the meantime, I dunno.
It was the basis for Google's internal Python annotations thingy, and it fscking rocks.
So excited to use this once my requirements are implemented (mostly just LFS, and to a lesser extent Merge(|Pull) Requests). Admittedly I don't need it, I just really appreciate the simplistic UI and straight forward pricing model.
LFS support is something I'd like to do, but It's Complicated(TM). Main challenges include finding a good place with ample bandwidth and storage, figuring out where/how to take backups of it, and measuring bandwidth and storage usage to integrate with billing. Not a priority right now, but may land between the beta and stable periods.
As for merge requests, don't hold your breath. SourceHut embraces the email-based model. A tutorial is available here to give you an idea of how it works: https://git-send-email.io and check out this video for the maintainer's side: https://aerc-mail.org/
The advantages of email include:
- It's based on a venerable and well-understood standards, with ample open-source tooling available
- It's decentralized, federated, and highly fault tolerant
- It doesn't lock you into my platform, you have ownership over your content and can freely interact with projects anywhere
It's also easy and natural to review code by writing emails, and by far the most efficient workflow for git collaboration I've used (having extensively worked in email, GitHub, GitLab, and Gerrit). I think you should give it a chance!
Well, technically (and as I'm sure you already know) pull requests are email-based[1] (they are used for Linux kernel development after all), so SourceHut already "supports" it. It could make sense to leverage that in the web UI. User A, who prefers email, emails the output of git request-pull to the mailing list, while user B, who prefers web, presses some button that does that for them. Doing the actual pull would work similarly. It also shares the advantages you listed, although an emailed patch stays in your mailbox backups forever, whereas a pull request URL could become invalid. That said, I'm sure it's easy to underestimate the effort it takes to build a nice pull request UI.
It definitely would make SourceHut more attractive for many people, but so would a nice Git plugin for VS Code that provides a nice UI for automating the git send-email process.
My issue with emails is that the user experience isn't as good and while it can be a worthwhile tradeoff for the points you mention, most of them are irrelevant in an enterprise setting (think using an internal SourceHut server instead of self-hosted GitLab or Bitbucket Server) so the UX benefit would be very much welcome.
I honestly disagree that the user experience isn't as good, and I don't think people have enough faith to try it out for a while. I've spent thousands of hours in many workflows and I still find email to be the easiest and most efficient way of doing it.
However, I don't focus on the enterprise crowd, you're right about that.
While that's probably true for seasoned OSS contributors, I think the email model is a tad abrasive for new/casual contributors.
I'm not necessarily saying it's the right choice for Source Hut, but I think it's part of the reason why Github is quite successful in the open source community.
I also think we're conflating use cases. Email is likely more efficient, I don't debate that. Yet I don't want it because of the time I do spend in a PR/MR UI (be it email or web ui) is so minimal that I want the process to be nearly fire and forget. The thought of learning some CLI based email client just so that I can merge a branch from someone else once a month feels.. well, it screams that it'll become like a Unix tool I rarely use.
I feel like he's asking me to open up Emacs every time I want to PR/MR, but I'm a Vim user. I don't have any email workflow currently, and I don't want to have to figure it out for a problem I don't have. He's telling me it's more efficient, but most cost vs savings ratio seems very off compared to what I imagine his is.
I don't disagree with anything he says about email, yet I have zero desire to try it out. If I was getting multiple MRs a day, or god forbid dozens a day, yea I'd be dying for something more efficient. But in the Emacs example, I wouldn't mind opening it if I was going to be doing it regularly.
Being required to use a workflow that you rarely use and is always in the "how did I do x?" part of your brain is a hard selling point for me. Even if that workflow is better, if I never remember how to use it what does it matter? It's still worse for me.
That's interesting - though part of why I like them to host the LFS is because I want the additional backups. It's definitely food for thought - the only LFS servers I've seen have been WIP / not-production-ready sort of things.
Yep, you could. One thing which would be super cool is highlighting a snippet of code, entering an annotation, and having it uploaded to git.sr.ht right there.
It has a nice summary of what's cool about it. It's very lightweight on the UI but it's actually very featureful, and includes mailing lists, CI service, etc.
I assume you're the same person I've been talking to on Lobsters. Clarification: they're talking about git push over https, which is deliberately unsupported in favor of the more secure SSH push option. git.sr.ht doesn't even have access to your password, so if the server is compromised then the attacker can't dump password hashes.
These are probably the opposite of what most people would call "Easy installation instructions" if you compare it to Gitlab's Omnibus package for example. They are well written and probably complete but not what people would call easy.
I suppose at some point I can set up a Dockerfile, but understanding each of these pieces is necessary to maintain even a Gitlab installation, and definitely to scale one (and Gitlab requires scaling much sooner than SourceHut, which is far more lightweight). Handholding is not always the right answer.
> There does not exist any widely available standalone C parsing library to provide C programs with access to an AST. There’s LLVM, but I have a deeply held belief that programming language compiler and introspection tooling should be implemented in the language itself. So, I set about to write a C parser from scratch.
Even if you prefer to write your C indexer in C, you could use LLVM's C [1] or Python [2] APIs. Plus, you can handle C++ without having to implement your own C++ parser from scratch, which is a much larger undertaking than C99 plus a few GNU extensions.
[1]: https://github.com/llvm-mirror/clang/blob/fb2a26cc2e40e007f1... [2]: https://github.com/llvm-mirror/clang/blob/master/bindings/py...