| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by delaminator 205 days ago

From my perspective, everything's DuckDB.

Single file per database, Multiple ingestion formats, full text search, S3 support, Parquet file support, columnar storage. fully typed.

WASM version for full SQL in JavaScript.

2 comments

sanderjd 205 days ago

This is a funny thread to me because my frustration is at the intersection of your comments: I keep wanting sqlite for writes (and lookups) and duckdb for reads. Are you aware of anything that works like this?

link

nlittlepoole 205 days ago

DuckDB can read/write SQLite files via extension. So you can do that now with DuckDB as is.

https://duckdb.org/docs/stable/core_extensions/sqlite

link

sanderjd 205 days ago

My understanding is that this is still too slow for quick inserts, because duckdb (like all columnar stores) is designed for batches.

link

theanonymousone 205 days ago

The way I understood it, you can do your inserts with SQLite "proper", and simultaneously use DuckDB for analytics (aka read-only).

link

sanderjd 205 days ago

Aha! That makes so much sense. Thank you for this.

Edit: Ah, right, the downside is that this is not going to have good olap query performance when interacting directly with the sqlite tables. So still necessary to copy out to duckdb tables (probably in batches) if this matters. Still seems very useful to me though.

link

dietr1ch 205 days ago

Analytics is done in "batches" (daily, weekly) anyways, right?

We know you can't get both, row and column orders at the same time, and that continuously maintaining both means duplication and ensuring you get the worst case from both worlds.

Local, row-wise writing is the way to go for write performance. Column-oriented reads are the way to do analytics at scale. It seems alright to have a sync process that does the order re-arrangement (maybe with extra precomputed statistics, and sharding to allow many workers if necessary) to let queries of now historical data run fast.

link

SchwKatze 205 days ago

I think you could build an ETL-ish workflow where you use SQLite for OLTP and DuckDB for OLAP, but I suppose it's very workload dependent, there are several tradeoffs here.

link

sanderjd 205 days ago

Right. This is what I want, but transparently to the client. It seems fairly straightforward, but I keep looking for an existing implementation of it and haven't found one yet.

link

swyx 205 days ago

very interesting. whats the vector indexing story like in duckdb these days?

also are there sqlite-duckdb sync engines or is that an oxymoron

link

cfors 205 days ago

https://duckdb.org/docs/stable/core_extensions/vss

It's not bad if you need something quick. I haven't had a large need of ANN in duckdb since it's doing more analytical/exploratory needs, but it's definitely there if you need it.

link