I used to work for Amazon. The code quality at places like Google and Amazon tend to be good.
S3 has a really good architecture and a great implementation.
HDFS has a meh architecture with a bad implementation.
There were obvious signs. I remember when Twitter decided to investigate why HDFS was slow and they figured out some details about how Hadoop guys decided to implement their own dictionary for configuration that had a much worse time complexity than the default dictionary in Java. There might be a video about this somewhere.
And there are more things like that. I used to have 5-10 years old HDFS Jira tickets open. I just gave up.
S3 has a really good architecture and a great implementation.
HDFS has a meh architecture with a bad implementation.
There were obvious signs. I remember when Twitter decided to investigate why HDFS was slow and they figured out some details about how Hadoop guys decided to implement their own dictionary for configuration that had a much worse time complexity than the default dictionary in Java. There might be a video about this somewhere.
And there are more things like that. I used to have 5-10 years old HDFS Jira tickets open. I just gave up.
Here is a video:
https://www.youtube.com/watch?v=jupArYWxoq0
Hadoop is full of these things.
One more thing:
https://lamport.azurewebsites.net/tla/formal-methods-amazon....
I would love to see similar approach to Hadoop.