Hacker News new | ask | show | jobs
by sveiss 4377 days ago
Llama+Impala isn't quite ready for prime time in my experience. The biggest issue is the reliance on Impala's query size estimates to determine how many resources to request from Yarn. We find that these estimates are frequently an order of magnitude or so away from reality.
1 comments

Agreed, and also LLAMA doesn't support high-availability at the moment (soon to be fixed). We rely heavily on up to date table/column statistics in order to accurately determine resource consumption, and unfortunately Impala doesn't currently have incremental/background stats, something that should be in the 2.0 release.