Hacker News new | ask | show | jobs
by sroerick 960 days ago
I really love DuckDB for one-offs and analytics and I wonder if anybody here has experience using with medium size data.

I still seem to run into the workflow problem where data has to be in proximity to compute in order to function.

If I need to run joins on a 5-10 GB parquet / table, unless I have that sitting locally, the performance bottleneck is not the database.

I still find myself reaching to Databricks / Spark for most tasks for this reason.

I suppose this is what Motherduck is trying to solve? But it just doesn't feel like it's quite there yet for me. Anybody who is better at this stuff than me have thoughts?

2 comments

I really like Motherducks hybrid execution model. My DS colleagues love abusing our data warehouse - bringing the data down locally to pound on is a win-win
Hi, head of Produck at MotherDuck here.

Yes, we are indeed a good use case for this. For one, we built a fully-fledged managed storage system on top of DuckDB, with better performance and caching and the like. Two, we're going to be pretty good at reading from S3 because we've optimized that path. Three, our storage has sharing/IAM and is about to have things like zero-copy clone and time travel.

Happy to answer any Qs.