Hacker News new | ask | show | jobs
by greg7mdp 3428 days ago
This is similar to what Google uses internally. See http://cacm.acm.org/magazines/2016/7/204032-why-google-store...:

"Most developers access Piper through a system called Clients in the Cloud, or CitC, which consists of a cloud-based storage backend and a Linux-only FUSE13 file system. Developers see their workspaces as directories in the file system, including their changes overlaid on top of the full Piper repository. CitC supports code browsing and normal Unix tools with no need to clone or sync state locally. Developers can browse and edit files anywhere across the Piper repository, and only modified files are stored in their workspace. This structure means CitC workspaces typically consume only a small amount of storage (an average workspace has fewer than 10 files) while presenting a seamless view of the entire Piper codebase to the developer."

This is a very powerful model when dealing with large code bases, as it solves the issue of downloading all the code to each client. Kudos to Microsoft for open sourcing it, and under the MIT license no less.

3 comments

Holy cow, it sounds like they reinvented Clearcase!
If they get this right, this can be MASSIVE for Microsoft in Enterprise. ClearCase was the reason why IBM was able to charge $1000+ per developer license fees.

ClearCase did what a lot of Enterprise companies needed at the time, and most importantly, it created hooks, that were mostly too difficult to remove. Once you create deep integration with ClearCase, you are very much committed to using it long term.

I was planning on responding to the person that had replied, but their comment has since been removed, and because I can't edit my post, I'll add some insight, as to why I used ClearCase as an example.

For those who have never worked with/administered ClearCase before, you may not fully appreciate how insanely complex it is. In order to use it, you first have to apply kernel patches from IBM, which shows how committed you had to be. ClearCase provided something that others couldn't, which is why it was so expensive. With Git, everything has changed.

Since nobody owns Git and its implementation, the differentiating factor right now is mostly superficial. There really isn't anything, other than hosting repos at a massive scale, that can't be easily duplicated. Git hosting, in my opinion, is now officially a commodity product. And what differentiates GitHub, GitLab, Bitbucket, etc. is mostly marketing.

With GVFS, things could change. This could be the first step, in Microsoft owning the hard part, that can't be easily duplicated by others. I really don't know what is on their roadmap, but views in ClearCase were pretty powerful and if they are looking at the level of integration, then it could be tough for GitLab, GitHub and others to follow.

My "favorite" Clearcase server issue was one that took 2 weeks of uptime before it resulted in a crash on a up-to-date AIX installation. I had to wait until we had somewhat identified the timeline before engaging our ops team so they could log the crash and submit it back to IBM so they could investigate/fix
Agreed, same w/GVFS IMO.

I ruminated about ccase/git elsewhere in this thread: https://news.ycombinator.com/item?id=13560108

Google's Piper is impressive (I used it), but it emulates Perforce. Having something Git-based is a lot more exciting. Hope someone ports it to platforms other than Windows...
Google is far more advanced than this. They have one giant monorepo (Piper) that's backed by Bigtable (or at least it was, when I was there). Piper was mostly created in response to Perforce's inability to scale and be fault tolerant. Until Piper came along, they would have to periodically restart The Giant Perforce Server in Mountain View. Piper is 24x7x365 and doesn't need any restarts at all. But the key bit here is not Piper per se. Unlike Microsoft, Google also has a distributed, caching, incremental build system (Blaze), and a distributed test system (Forge), and they are integrated with Piper. The vast majority of the code you depend on never actually ends up on your machine. Thanks to this, what takes hours at Microsoft takes seconds at Google. This enables pretty staggering productivity gains. You don't think twice about kicking off a build, and in most cases no more than a minute or two later you have your binaries, irrespective of the size of your transitive closure. Some projects take longer than that to build, most take less time. Tests are heavily parallelized. Dependencies are tracked (so tests can be re-run when dependencies change), there are large scale refactoring tools that let you make changes that affect the entire monorepo with confidence and without breaking anyone.

Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.

I'm a Microsoft employee on the Git team. We do have a distributed, caching, incremental build system and a distributed test system. Right now, they're completely internal - like Google. They're called CloudBuild and CloudTest. They're very fast and no one thinks twice about kicking off a build.
Google employs a distributed, caching, incremental build system and a distributed test system across the majority of their code base. I worked in Windows Store and I can assure you that most people there don't use CloudBuild and CloudTest, let along know what they are. I would be confident in saying the majority of people at Microsoft are in that boat.
Just to give others an idea of what this is like: I work on the Protocol Buffers team at Google. Almost all software at Google depends on my team's code. If I have a pending change, I can test it against the whole codebase before submitting (this is a "global presubmit"). If something breaks in global presubmit, I can build and run any target in the codebase with my change in a single command, and this will take O(10 minutes) to build from scratch.

This would be like if I worked on the core Windows SDKs and I could routinely test my changes against everything from Microsoft Flight Simulator to the Bing server code before I submit.

Is the ChromeOS / Android team like this because it really sounds like we're comparing Epeens.. i'd be surprised if android builds typically build world and do so in 10 minutes..
There are parts of the company that aren't in this ecosystem, usually for legacy reasons. But even if they were you'd find a lot of this stuff to still be shockingly fast.

Because the build server is centralized it can be aggressive about caching intermediate build steps. Incremental builds aren't just incremental for you, but incremental for everybody.

Dunno how it is now, but years ago it'd take them a _few weeks_ to just propagate commits into the stable branch through a series of elaborate branch integrations, so yeah, you couldn't change something and test it on a whim. Plus build of just windows alone would take overnight, and rebuilding everything to test a Windows change was not logistically, politically, or technically possible.
> you couldn't change something and test it on a whim

You could. It would just not leave your branch for a while. Around the scheduled merges it would run against the tests of progressively more of the larger organization.

Parts of this actually constituted a good way to prevent being distracted by the bugs of faraway teams. If something reached your branch, where you were working, it was vetted by the tests required to make it into winmain.

The downside was that people got fairly political about what goes into the branch and when, even for small things.

+1. I know some insiders at MS and their build/test/deployment story is universally very crappy. Things barely work, held together by curse words and duct tape.

Googlers like to joke internally that Google looks like a race car from the outside and like Moving Castle from Hayao Miyazaki'a cartoon from the inside, but that's not the case at all. Comparatively speaking it's a race car inside and out, it's just that the insiders don't know how shitty things are elsewhere.

P.S. I heard Bing is different, but I have no visibility into it, so can't comment.

I worked in the cloud at MS roughly 7 years ago, and Bing was very different from anything else. The MS cloud services were a total mess. I mean, as in hard to overstate how bad. In contrast, Bing was a well-oiled machine. Simple build, simple deploy, simple devops, simple test, very consistent processes across all of their teams.

At the time, Azure was a joke (partly due to the fact that the initial teams were headed up by ex Office devs with no cloud experience, if I remember correctly). But Azure was cannibalizing the Bing team pretty hard. I hear that strategy worked and that Azure is in much more capable hands now.

> Every single ex-Googler misses it quite a bit.

I dunno. I don't miss the 1-minute incremental builds. (Maybe they've improved since I left, though.)

BTW Forge is not just the test runner, but the thing that runs all build tasks, farmed out to all servers. Blaze interprets the build language and does dependency tree analysis but then hands off the tasks to Forge. Blaze has been (partially) open sourced: https://bazel.build/

I know. :-)
> Google's dev infra is pretty amazing and it's at least a decade ahead of anything else I've seen. Every single ex-Googler misses it quite a bit.

This may be naive but why not recreate it as an open source project?

Blaze has been: https://bazel.build/

Forge and Piper are built on Google's internal tech stack and designed for Google's production infrastructure, so open sourcing them would be a very big project. I think it would be a lot more likely for them to be offered as a service -- and that might be more useful to users anyway, since you'd be able to share resources with everyone else doing builds, rather than try to get your own cluster running which might sit idle a lot of the time. Of course, there are privacy issues, etc.

(Disclaimer: I'm purely speculating. I left Google over four years ago, and have no idea what the tools people are up to today.)

Because very few people have a need to support a billion-LOC monorepo on which 30K engineers make tens of thousands of commits daily. That's where this system shines.

For smaller projects, Git+Bazel (open source, non-distributed version of Blaze) works fine if you're working with C++, and other build systems work OK as well, if you're working with other languages.

I think this would be better the piper+citc. While the virtual filesystem aspect is nice the perforce model is far inferior to git's. (IMO) Of course Google has tools on top of it. But it's not fair to compare a VCS and interface to it to a complete development infrastructure. Hell, with the content addressing of git it would even make it easy to build something similar.
> perforce model is far inferior to git's

That's just, like, your opinion, man. There are other bits of infra that integrate with it quite nicely, and would integrate with something like Git quite poorly. One of those things is their code review system. The closest thing I could find to it outside Google is Gerrit, but it's a tremendous pain to set up and use, and it's but a pale shadow of Google's internal tool (Critique).

And also, one does not preclude another: Google has a git wrapper on top of Piper, so you can spend your entire Google career not even touching Piper directly if that's what you prefer. And Piper went beyond the "Perforce model" in ways I can't disclose here.

Check out Reviewable, it's influenced by my fond memories of Critique but designed specifically for git (and GitHub).
Reviewable is amazing, thanks for making that.

I've used lots of review tools and worked a bit on Google's review tool and on ReviewBoard in the past, and Reviewable is better than all of them in my opinion (or at least, better than when I last used the others).

I wonder how do you know what parts you CAN disclose?
Whatever is already public. You can watch a video about Piper and some other systems on YouTube, and read about other things from Google's own blogs, papers, etc.