|
|
|
|
|
by monstrado
4780 days ago
|
|
We have a 14 node cluster, the nodes have anywhere between 4-6 disks. Performance has been pretty amazing, we can do ad-hoc queries on this 4.5B row table. Each node has read throughput at about ~1.3GB/s for full table scans (data is snappy compressed, store as RCFile: columnar). |
|
What drawbacks have you found with Impala? I've been keeping an eye on it, and also Shark: http://shark.cs.berkeley.edu/