|
|
|
|
|
by isjustintime
462 days ago
|
|
This is pretty exciting. DuckDB is already proving to be a powerful tool in the industry. Previously there was a strong trend of using simple S3-backed blob storage with Parquet and Athena for querying data lakes. It felt like things have gotten pretty complicated, but as integrations improve and Apache Iceberg gains maturity, I'm seeing a shift toward greater flexibility with less SaaS/tool sprawl in data lakes. |
|
May be of interest to people who:
- What to know what DuckDB is and why it's interesting
- What's good about it
- Why for orgs without huge data, we will hopefully see a lot more of 's3 + duckdb' rather than more complex architectures and services, and hopefully (IMHO) less Spark!
https://www.robinlinacre.com/recommend_duckdb/
I think most people in data science or data engineering should at least try it to get a sense of what it can do
Really for me, the most important thing is it makes it so much easier to design and test complex ETL because you're not constantly having to run queries against Athena/Spark to check they work - you can do it all locally, in CI, set up tests, etc.