|
|
|
|
|
by jetblackio
3154 days ago
|
|
Not sure if this is a good place to ask, but how do Apache Arrow and Parquet compare to Apache Kudu (https://kudu.apache.org/)? Seems like all three are columnar data solutions, but it's not clear when you'd use one over the other. Kind of surprised the article didn't mention Kudu for that matter. |
|
Apache Kudu is much more than a file format - it is a columnar distributed storage engine. One way to think of Kudu is as mutable Parquet, but really it's a database backend that integrates with Impala and Spark for SQL, among other systems. It's fault tolerant, manages partitioning for you, secure, and much more. For a quick introduction to Kudu you can check out this short slide deck I put together over a year ago... it's a bit dated but a good overview: https://www.slideshare.net/MichaelPercy3/intro-to-apache-kud...
For more up-to-date information, follow the Apache Kudu Blog at http://kudu.apache.org/blog/ or follow the official Apache Kudu twitter account @ApacheKudu.