| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by taylorlafrinere 3318 days ago
	Most of that 300GB isn't text. There are test assets, images, videos, built binaries, vhd's, etc. Also, I should be clear that that 300GB is just at tip (no history). We can debate about whether or not those things should be checked into the repo but they are there now.

2 comments

alkonaut 3318 days ago

How did you go about creating the central repo and how long did it take? A 2Gb at tip svn repo with 100k commits is taking me many days and each odd failure typically has me restart the process after filtering out some obscure part of the tree.

Edit: read in another comment that you dropped the history. Understandable, but can appreciate how that would add to friction (devs having to look through two different histories).

link

vtbassmatt 3318 days ago

The Windows team developed a tool called "GitTrain" that knew how to:

- migrate the tip of a branch to Git (yes, the 300GB number is the tips of all the interesting branches, not the history)

- keep a Git branch and a SD branch in sync for a while

- be re-run over each of the 400+ branches they care about

But they went through some of the same trial-and-error process that you're describing.

link

mschuster91 3318 days ago

Whoa. 300GB with a shallow clone?! What size does the whole repo use on the server side?

link

wilatmsft 3318 days ago

The pack file size for a full clone is 187GB. The 300GB is the working directory. We did not import the history of the code base, so the current repo only has about 5 months of history. As others have called out, there are a lot of assets in the repo that don't compress.

link

bokchoi 3318 days ago

Why only 5 months? Will more of the history be added to the git repository eventually?

link

vtbassmatt 3318 days ago

No, we'll keep the SD servers around for a while for servicing older products. We also have a "breadcrumbing" system that lets an engineer follow a file's history back from Git to the old system.

link

ysleepy 3318 days ago

Was importing the complete history tried during the development? This is very interesting. The git history will grow at break-neck speed and will reach similar size soon enough. Is this to delay the inevitable tech wrangling for dealing with terabyte histories or were there issues with the import/sync?

Or maybe it was just the initial repo setup used for alpha testing that got promoted to production :)

link