Hacker News new | ask | show | jobs
by jma24 4272 days ago
Doing some basic math... Wikipedia is around 80m rows a day, so 4 months of Wikipedia is around 9.5bn rows. But they show 17bn on the graph.

Typical columnar compression gives about 11GB per 1bn rows, so 17bn rows should be 187GB. The AWS machines they are using should be c3.4xlarge which are 30GB, and 6 of them is 180GB. But you can't run an in-memory column store at 100% RAM, you need to run it at 50-70% so you have capacity for calculations.

Is it just me or do the results not make any sense? Seems likely they actually had 9.5bn or 4 months data, which conveniently is what the graph shows?