Hacker News new | ask | show | jobs
by ThomasMoll 923 days ago
Not only that but the Hadoop team literally had the guy who wrote the original HDFS whitepaper. Moving a service with that much in house expertise first never made sense. I worked on one of the original Azure PoCs for Hadoop, even before Blueshift and it was immediately clear that we operated at a scale that Azure couldn't handle at the time. Our biggest cluster had over 500PB and total we had over an exabyte as of 2021 [1]. It was exorbitantly expensive to run a similar setup on VMs, and at the scale that we had I think it would have taken over 4,000 - 5,000 separate Azure Data Lake namespaces to support one of our R&D clusters. I believe most of this "make the biggest cluster you can" mentality was a hold over from the Yahoo! days.

[1] https://engineering.linkedin.com/blog/2021/the-exabyte-club-...