|
|
|
|
|
by jupiter90000
3411 days ago
|
|
Different example, doing a simple 'group by' sparksql query on only about 20 million rows on a distributed phoenix/hbase table couldn't even be completed because of spark dumbly shuffling all the data around the cluster. Spark/phoenix RDD drivers apparently had no 'group by' push down support for phoenix so shuffled all the data amazingly inefficiently. Running the same query directly on phoenix took all of about a minute to finish. My point is, these 'on a laptop/single machine memory' examples don't really give me an indicator of scenarios where I might actually want to use spark/etc. |
|
[1] https://issues.apache.org/jira/browse/PHOENIX-3600