Hacker News new | ask | show | jobs
by embiggen 3508 days ago
Meanwhile, google was sorting Petabytes in under a minute on their clusters 6+ years ago. We've still got a long ways to go in OSS land to compete with the big boys.
1 comments

This post tells the "History of massive-scale sorting experiments at Google"

- https://cloud.google.com/blog/big-data/2016/02/history-of-ma...

When I asked why BigQuery doesn't do these sorts, the answer came straight from the post "Nobody really wants a huge globally-sorted output. We haven’t found a single use case for the problem as stated."

These accomplishments are awesome nevertheless!

Disclaimer: I'm Felipe Hoffa, and I work for Google (http://twitter.com/felipehoffa).

Do you think you could ask someone and find out the cluster sizes they used for those sorts? They mention "With the largest cluster at Google under our control", but it would be more interesting to have an idea of actual numbers, even if just an order of magnitude.
I could ask - but then I wouldn't be able to publish unpublished numbers on my own (if I want to keep my job).

:)