Hacker News new | ask | show | jobs
by zifnab06 2437 days ago
Github is primarily Rails/MySQL (or was last time I paid attention to any of their blogs), I'm guessing they're storing dates as a TIMESTAMP and not a DATETIME (4 bytes vs 8 bytes).

GitHub's BigQuery public data set has 234,759,841 unique commits, and it appears there's 2 dates per commit (author and committer dates). So an extra ~1.8GB per master/shard group.

Entirely doable but I have no idea what their scale actually is or how that translates to network throughput or anything else really.