Hacker News new | ask | show | jobs
by chrschilling 1804 days ago
IATAOJ (I Am The Author Of Josh) ;)

You are absolutely right about the main motivation of using a monorepo: Allowing upsrteam library maintainers to see downstream usage of their code and make the required downstream changes themselves at the same time they change their libraries.

Also like you say the easiest way to get those advantages is to just check out the monorepo locally, so if there are no other reasons preventing you from doing just that, go for it.

However there are a few reasons why this is not always sufficient:

Size: The repo might be so large that cloning it all will makes local tools (git cli, guis,...) slow to use, or in the most extreme case require to much disk space for your machine. To address this there are some git native tools like partial clone and sparse checkout, so size alone is not really the the main issue for us.

History "pollution": Having a lot of somewhat loosely related projects in one tree means a history that shows all the changes. Yes git can filter them, but once again that might be a performance concern, but once again not really the biggest motivation to create a new approach/tool.

Permissions: In some organisations (like the one I work for) it is not possible to give all developers access to all the code and thus the advantages of monorepo get lost just by trying to comply with data protection standards. The only solution with native git is to split the repo at legal (not necessary technical) boundaries and try to coordinate the changes across those. Loosing most of the benefits described. Josh does not have a full blown permissions system yet, but the concept certainly allows for it and implementation is work in progress.

Sharing with others (aka, distributed VCS): This is the biggest motivation for using something like Josh. The partial repos are repos in their own right and all the distributed features of git can be used with them. In a monorepo setup as you describe distributed workflow is sacrificed for monorepo advantages. Only developers in the same monorepo see the same sha1s and can easily exchange changes. In Josh the same library can be part of different monorepos at different organisations and while the monorepos have different history and therefore sha1s, the “projected” or “partial” library subrepos will have compatible history with identical sha1s. In this way Josh can serve as a bridge between organisations using different repo structures.

1 comments

This is a really interesting approach. How would you layer CI logic on top of this? Given your example workspace josh file,

    dependencies = :/modules:[
        ::tools/
        ::library1/
    ]
how are the canonical build artifacts for, say, ::library1/ determined, and how are they presented to the workspace?

I understand that the partial repo layering is the key innovation that exists a layer below what I'm talking about, but I'm trying to understand how you can ergonomically layer never-build-twice logic on top of it.

For the CI Josh is only used to determine if a given commit affects a given workspace. This can be done server side using the Josh GraphQL API. Having this understanding about the dependencies of workspaces understood by the vcs server(in this case Josh) means that such a query can be executed server side before any git clone/fetch needs to happen and in our case also before the CI allocates a machine to do the clone/checkout.

What artifacts are to be build inside a given workspace is totally up tho the build system(s) and tools that work after the files have been checked out to a working copy at which point Josh is not involved at all.