| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aanm1988 3420 days ago

They have a billion files in their repo, 9 million are source files.

What the heck is the other 991000000?

I skimmed this. Mostly just stuff any competent company would/should be doing. it's google though, so they act like it's super awesome.

9 comments

Waterluvian 3420 days ago

Someone checked in a `node_modules` folder by accident.

link

jazoom 3420 days ago

That would pretty much do it

link

kyrra 3420 days ago

See this ACM article on it: http://m.cacm.acm.org/magazines/2016/7/204032-why-google-sto...

Lots of things aren't source files: test data, config files, build files, metadata, documentation, etc.

link

richsinn 3420 days ago

> Mostly just stuff any competent company would/should be doing. it's google though, so they act like it's super awesome.

Yes, you're absolutely correct. But here's the thing - it was actually Google that pioneered many of this. Many of the big/competent companies that are following these practices are because of Google's "DNA" leaking into those companies (via former employees bringing along the best practices learned at Google, etc.)

link

jeremyjh 3420 days ago

They may have done a better job instituting these practices across a large organization, and some of their tools have very useful and novel features, but I very much doubt there is a single practice that they actually invented. If you think there is one, please be specific. I think what Google contributed is evidence that these practices can be instituted at scale, which really was sorely lacking in some cases. This helped the industry disseminate them.

link

richsinn 3417 days ago

Of course it's hard to say if they completely, 100% invented anything from scratch. But they sure did "pioneer" a lot of unique practices that other software companies were not following at the time.

A specific example - the practice of keeping the entire codebase at the company under a single "source" repo. Pre-Google - it would've been considered outrageous to have the entire codebase of a sophisticated software company keep their entire software contents under a single repo. But Google did it, and other companies have followed suit successfully (as Google DNA has leaked to other companies).

Yes, of course keeping code in a single repo is not a "new invention". Linux is a single repo; many smaller companies have only a single repo because their only product is a single web app. Google keeps nearly 100% of their entire codebase in a single repo - and that was definitely a novel approach at the time.

link

theparanoid 3420 days ago

Microsoft used to have the best practices and...they were mostly as good as Google. Everything old is new again.

link

exmsft6780 3420 days ago

As someone who worked at both companies for a long time, I assure you that Google's best practices (circa when I switched) were a generation ahead of Microsoft's. Mostly due to MSFT having much longer software release cycles, a more primitive, Windows-based internal cloud, many legacy build systems, less inter-group trust, and little company-wide desire to improve things.

link

gumby 3420 days ago

> What the heck is the other 991000000?

Says right in the article: various config and dependency files, presumably both as caches (where everyone would generate the same product) or as a record of where things stood on at time t.

For example:

> In some cases, notably Go programs, build files can be generated (and updated) automatically, since the dependency information in the BUILD files is (often) an abstraction of the dependency information in the source files. But they are nevertheless checked in to the repository.

link

pzh 3420 days ago

So basically somebody can write a script to put this Build file in gitignore, save the company millions of dollars, and get promoted for it?

link

jrockway 3420 days ago

There are other possible annotations in the build file.

You can get an idea of what it looks like by reading the Bazel docs: https://bazel.build/versions/master/docs/be/overview.html

Storing a few text files at Google doesn't cost millions of dollars, BTW.

link

mattnewton 3420 days ago

They don't use git or any other distributed version control system, so there is no incentive to keep it small. And anything outside the source control system isn't accessible to all the tools that use it, so it would introduce complexity.

link

harryjo 3420 days ago

Then you have to run a tool that processes .go source files in order to perform dependency analysis. Consistency is a virtue.

link

bskap 3420 days ago

Nah, because it would cost the company billions of dollars in lost productivity waiting for these files to get re-processed every time someone built the thing. Google's general philosophy is that humans are expensive and computers are cheap, so pretty much anything that helps the humans go faster is a going to be a net benefit in the long run.

link

gsylvie 3420 days ago

People rarely get promoted for saving money.

link

williadc 3420 days ago

Finance people do.

link

gsylvie 3420 days ago

Good point. I meant software engineers rarely get promoted for saving money.

link

justicezyx 3420 days ago

This does not apply in this post though.

There are thousands of SWEs working on systems to save money for Google.

link

qu1j0t3 3412 days ago

They sure do at Amazon. Frugality is one of the explicit leadership principles and initiatives often have cost saving as a primary goal and always as a secondary goal.

link

xtracto 3420 days ago

It was very eye opening and helpful for me. Given that at our startup we are just starting to grow and trying to set software development processes and standards to help with the growing number of devs, this info provides a good guidance on what to aim for, and also showed me that we are going in the right path in several ways.

link

dheera 3420 days ago

Well, I just fired up Android Studio and created a blank app. I ended up with no less than 77 files. Seriously, 77 files for a freaking BLANK app.

That "Hello World" Flask program that was 1 nice cute file? It's about 20 files deployed in Heroku.

Sometimes I wonder if things really need to be this complicated.

link

xapata 3420 days ago

No, they don't.

link

Veratyr 3420 days ago

> Mostly just stuff any competent company would/should be doing.

Many companies should be doing this. Few (that I know of) are doing this.

Making data-driven decisions also should be a thing, yet many still make them based on nonsense like politics.

link

dasil003 3420 days ago

Right, because data leaves no room for interpretation.

link

halflings 3420 days ago

No need to be needlessly sarcastic. Data-driven means that you collect various metrics on dev workflow, what slows productivity, or on the product side (user patterns, retention, etc.) and use those when making decisions. Unfortunately, many companies still base their decisions very simplistic metrics and/or on "instinct".

link

dasil003 3419 days ago

Sorry, it was late and I didn't want to write a more substantive response.

The issue here is that politics are unavoidable. Being more data-driven is just another way of running your political process. And yes, it's a better way as long as you know its limitations. Collecting data and sifting through it to extract useful information takes time, creative thinking, and even "instinct" to figure out the right questions and hypotheses. Furthermore if you're going to collect data on dev workflow you better not have incentives there for employees or they will be gamed.

One of my pet peeves is technical people who worship so strongly at the altar of rationality that they are blind to their own biases. Even the most guileless and logical engineer still has an emotional life and worldview that forms the building blocks of what turns into "politics" when you get a large group of people together.

link

halflings 3417 days ago

Agreed! I also hate it when people think their methods are so rational that they represent the ground truth, and are not biased in any way.

link

harryjo 3420 days ago

> would/should

There are billions of dollars of difference between "would" and "should"

Who is "they" who act like it's super-awesome?

link

tehlike 3420 days ago

Translation files, xml files, some data files, images, and lots of other things.

Ps: goog employee

link