Hacker News new | ask | show | jobs
by gizmodo59 2054 days ago
Another SQL engine on data lake that heavily uses arrow is Dremio.

https://www.dremio.com/webinars/apache-arrow-calcite-parquet...

https://github.com/dremio/dremio-oss

If you have parquet on S3, using an engine like Dremio (or any engine based on arrow) can give you some impressive performance. Key innovations in OSS on data analytics/data lake:

Arrow - Columnar in memory format; Gandiva - LLVM based execution kernel; Arrow flight - Wire protocol based on arrow; Project Nessie - A git like workflow for data lakes

https://arrow.apache.org/. https://arrow.apache.org/docs/format/Flight.html. https://arrow.apache.org/blog/2018/12/05/gandiva-donation/ https://github.com/projectnessie/nessie

1 comments

What service could I replace Athena / PrestoDB that uses Apache Arrow?
Looking at Dremio’s website they seem to be a good competitor to presto/Athena for some use cases.

Alternative solutions depends on your use case. If it’s about querying S3 data then Dremio/Athena/Presto/spark are good.