|
|
|
|
|
by Qu3tzal
2325 days ago
|
|
Isn't it the same as before? If 4gb of data was too big because you had 2gb of RAM, then the methods used at that time are the same you would apply for a 500gb dataset that can't fit in a 250gb RAM machine, right? New issues appear when you have to analyze 2Tb with a 32gb RAM machine, but when the order of difference is the same, the issues and thus the answers are the same as before? |
|
Also, the rest of the use cases (which fits into a single machine memory now), can be handled much more efficiently with memory base algorithm, instead of I/O based algorithms.
The goal of Hadoop, as well as most of the theory on disk-based indices (E.g. BTREE), was to overcome the I/O bottlenecks. But as memory is getting bigger and cheaper there is a trend to drop Hadoop in favor of reading data directly from the cloud and into memory.