| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jdcryans 3797 days ago

You seem to have a pretty typical use case that we're targeting. One thing to understand about Kudu is that it doesn't run queries, it only stores the data. You can use Impala or Drill, they'll figure out the locality and apply the aggregations properly/push down the filters to Kudu.

Did you initially pick ES over systems like Impala because of the lack real time inserts/updates when used with HDFS?

BTW, here's a presentation that might help you understand Kudu: http://www.slideshare.net/jdcryans/kudu-resolving-transactio...

1 comments

lobster_johnson 3797 days ago

Thanks, that's helpful. We picked ES for several reasons. We're not a Java shop, and the Hadoop ecosystem is heavily biased towards JVM languages.

Secondly, ES is easy to deploy and manage. Being on the JVM, it admittedly has a considerable RAM footprint, but at least it's just one daemon per node. With anything related to Hadoop, it seems you have this cascade of JVM processes that inevitably need management. And lots and lots of RAM.

Thirdly, as you point out it's easy to do real-time writes.

I do like the fact that Kudu is C++.

link

jdcryans 3797 days ago

Well TBH you do get to pick which Hadoop-related components need to run, HDFS's Datanode itself is happy with just a bit of RAM. I do understand the concern though.

You're probably happy with what you have in prod but if you get some time to try out Kudu feel free to drop by our Slack channel for a chat! http://getkudu.io/community.html

link

lobster_johnson 3796 days ago

Thanks — I will definitely be keeping an eye on the project.

link