Hacker News new | ask | show | jobs
by delaminator 157 days ago
From my perspective, everything's DuckDB.

Single file per database, Multiple ingestion formats, full text search, S3 support, Parquet file support, columnar storage. fully typed.

WASM version for full SQL in JavaScript.

2 comments

This is a funny thread to me because my frustration is at the intersection of your comments: I keep wanting sqlite for writes (and lookups) and duckdb for reads. Are you aware of anything that works like this?
DuckDB can read/write SQLite files via extension. So you can do that now with DuckDB as is.

https://duckdb.org/docs/stable/core_extensions/sqlite

My understanding is that this is still too slow for quick inserts, because duckdb (like all columnar stores) is designed for batches.
The way I understood it, you can do your inserts with SQLite "proper", and simultaneously use DuckDB for analytics (aka read-only).
Aha! That makes so much sense. Thank you for this.

Edit: Ah, right, the downside is that this is not going to have good olap query performance when interacting directly with the sqlite tables. So still necessary to copy out to duckdb tables (probably in batches) if this matters. Still seems very useful to me though.

Analytics is done in "batches" (daily, weekly) anyways, right?

We know you can't get both, row and column orders at the same time, and that continuously maintaining both means duplication and ensuring you get the worst case from both worlds.

Local, row-wise writing is the way to go for write performance. Column-oriented reads are the way to do analytics at scale. It seems alright to have a sync process that does the order re-arrangement (maybe with extra precomputed statistics, and sharding to allow many workers if necessary) to let queries of now historical data run fast.

I think you could build an ETL-ish workflow where you use SQLite for OLTP and DuckDB for OLAP, but I suppose it's very workload dependent, there are several tradeoffs here.
Right. This is what I want, but transparently to the client. It seems fairly straightforward, but I keep looking for an existing implementation of it and haven't found one yet.
very interesting. whats the vector indexing story like in duckdb these days?

also are there sqlite-duckdb sync engines or is that an oxymoron

https://duckdb.org/docs/stable/core_extensions/vss

It's not bad if you need something quick. I haven't had a large need of ANN in duckdb since it's doing more analytical/exploratory needs, but it's definitely there if you need it.