Hacker News new | ask | show | jobs
by spullara 4367 days ago
The name node is a significant vertical scaling challenge. It is one of the reasons that Yahoo limits their cluster sizes.
2 comments

That's right, it should be noted however that the need of horizontally scaling the equivalent of the NameNode kicks in only when you really have a very large storage system, where large can be defined as:

"You know you have a large storage system when you get paged at 1 AM because you only have a few petabytes of storage left." [1]

Even if you are below that size you might have a large system, and even if using a single metadata master node might be a sound solution you still have lot of interesting problems to solve. Don't make systems more complex than necessary.

More pointers to google public material on colossus in [2].

[1] http://static.googleusercontent.com/media/research.google.co...

[2] http://www.highlyscalablesystems.com/3202/colossus-successor...

Very few cluster sizes in enterprises reach this limit, and by limit I mean 1k -> 2k nodes. In reality, there's very little demand for namenode scalability at the current moment, but once this changes, the community will implement it.