|
|
|
|
|
by ryanbrush
4860 days ago
|
|
I wish the post had gone into depth on _why_ Redshift was significantly faster, but I'm betting it uses in-memory joins whereas (hence the size limitations it mentions) whereas Hive joins are just MapReduce jobs that keep only minimal subsets of data in memory at a given point. The upshot is the Hive/MapReduce strategy isn't limited by physical memory. Of course, if your data set can fit in memory, then Redshift or similar technologies probably is a better choice than Hive. But it's important to remember that the performance gains here come as the result of a tradeoff. |
|
The comparison is complete and utter joke.