| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by embiggen 3508 days ago
	Meanwhile, google was sorting Petabytes in under a minute on their clusters 6+ years ago. We've still got a long ways to go in OSS land to compete with the big boys.

1 comments

fhoffa 3508 days ago

This post tells the "History of massive-scale sorting experiments at Google"

- https://cloud.google.com/blog/big-data/2016/02/history-of-ma...

When I asked why BigQuery doesn't do these sorts, the answer came straight from the post "Nobody really wants a huge globally-sorted output. We haven’t found a single use case for the problem as stated."

These accomplishments are awesome nevertheless!

Disclaimer: I'm Felipe Hoffa, and I work for Google (http://twitter.com/felipehoffa).

link

andrioni 3508 days ago

Do you think you could ask someone and find out the cluster sizes they used for those sorts? They mention "With the largest cluster at Google under our control", but it would be more interesting to have an idea of actual numbers, even if just an order of magnitude.

link

fhoffa 3508 days ago

I could ask - but then I wouldn't be able to publish unpublished numbers on my own (if I want to keep my job).

link