Hacker News new | ask | show | jobs
by ignoramous 1482 days ago
We do something similar, but:

- instead of S3, we now use R2.

- instead of Postgres+Sqlite3, we use DuckDB+CSV/Parquet.

- instead of Lambda, we use AWS AppRunner (considering moving it to Fly.io or Workers).

It worked gloriously for variety of analytical workloads, even if slower had we used Clickhouse/Timescale/Redshift/Elasticsearch.

1 comments

How has your experience been with DuckDB in production? It is a relatively new project. How is it’s reliability?
For our scale and request patterns (easily-partitioned / 0.1 qps), no major issues but the JavaScript bindings (which are different to their wasm bindings) that I use leave a lot to be desired. To DuckDB's credit, they seem to have top-notch CPP and Python bindings that even support the efficient memory-mapped Arrow format that's purpose-built for cross-language / cross-process , in addition to being top-notch in-memory representation for Panda-like data-frames.

Granted DuckDB's is in constant development, but it doesn't yet have native cross-version export/import feature (since its developers claim DuckDB hasn't reached maturity to stabilise its on-disk format just yet).

I also keep an eye on https://h2oai.github.io/db-benchmark/ As for Arrow-backed query engines, Pola.rs and DataFusion in particular sound the most exciting to me.

It also remains to be seen how DataBrick's delta.io develops (might come in handy for much much larger data-warehouses).