| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nfa_backward 3915 days ago
	Kudu is being positioned as filling the gap between HDFS and HBase. After reading the overview I see this more as bringing features from HDFS+Parquet+HBase. Does that sound reasonable? Super excited about this and even more so since it is open source. Thank you!

1 comments

tlipcon 3915 days ago

Yep, that's correct. HDFS+Parquet is more accurate but doesn't fit quite as well on slides and short descriptions.

The idea is to get the analytic scan performance of Parquet while still allowing for in-place updates and row-by-row access like HBase.

HDFS (with Parquet or other formats) will still be better for unstructured or fully immutable datasets. HBase will still be better when your top priority is ingest rate, random access, and semi-structured data. Kudu should be good when you've got tabular data as described above.

link

nfa_backward 3915 days ago

Impala has an in-memory columnar format on its road map for 2016. Is that format being design with Kudu in mind?

Edit: I understand that the formats, while both columnar, serve different purposes. I am more curious about overlap if any between the two.

link

tlipcon 3915 days ago

Yep, I've been taking part in those design discussions. We hope to have Kudu tablet servers support generating this in-memory format in shared memory as the result of scans, so the Impala server (client from Kudu's perspective) can directly operate on the data. We're expecting a 20-30% speed boost from this for some queries, though haven't done any tests at scale of the prototype.

link