|
|
|
|
|
by tlipcon
3915 days ago
|
|
Yep, that's correct. HDFS+Parquet is more accurate but doesn't fit quite as well on slides and short descriptions. The idea is to get the analytic scan performance of Parquet while still allowing for in-place updates and row-by-row access like HBase. HDFS (with Parquet or other formats) will still be better for unstructured or fully immutable datasets. HBase will still be better when your top priority is ingest rate, random access, and semi-structured data. Kudu should be good when you've got tabular data as described above. |
|
Edit: I understand that the formats, while both columnar, serve different purposes. I am more curious about overlap if any between the two.