|
|
|
|
|
by praseodym
4047 days ago
|
|
And if your data doesn't fit in a single server's RAM, just buy some more and run Apache Spark [1] on them. It's an in-memory computation engine that's really nice to program for: you don't have to worry about low-level clustering details (like MapReduce). And it's way (10-100x) faster than Hadoop. [1] https://spark.apache.org |
|
The recent addition of SparkR in 1.4 means that now data scientists can leverage in memory data in the cluster that has been put there by output from either Scala or DW developers.
Combine it with Tachyon (http://tachyon-project.org) and it's not hard to imagine petabytes of data all processed in memory.