|
|
|
|
|
by vosper
4780 days ago
|
|
That sounds pretty fantastic - when you say "ad-hoc" do you mean that it's fast enough to be directly queried from a UI - are we talking seconds or minutes for your queries? What drawbacks have you found with Impala? I've been keeping an eye on it, and also Shark: http://shark.cs.berkeley.edu/ |
|
We haven't yet had a chance to optimize our environment either. For example, our nodes are still running a pretty old version of CentOS, so we have LLVM disabled (which would help a lot for huge batch computations...see http://blog.cloudera.com/blog/2013/02/inside-cloudera-impala...).
Also, our data is stored in RCFile, which is not exactly the most optimized columnar storage format. We're working on a plan to get everything over the new Parquet (http://parquet.io/) columnar format for another boost in performance.
We haven't come across any real drawbacks using Impala as of yet, it fits our needs pretty well.
Disclaimer: I work for Cloudera in their internal Tools Team, we like to dog food our stuff :).
Edit: One drawback of Impala is the lack of UDF support, but this is something that will be coming in a later release.