|
|
|
|
|
by no_wizard
2170 days ago
|
|
On the topic of size, I wonder how small it would be if you were able to deduplicate all repositories against each other. I sometimes suspect there is a tremendous amount of copy/paste code out there masquerading as someone else’s. Even a naive deduplication might yield some very interesting results Reminds me of a time I caught someone using someone else’s code in an interview and passing it off as their own. (Using was fine, it was the claim that it was theirs that bugged me) |
|
The size of all file contents (including older versions of files) is a few hundreds TBs, and everything else (directory structures, revision history, etc.) is under 10TB.
So for GitHub alone it would be a little under that