| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chrisjc 857 days ago

Came here with similar questions and Cmd-F "DuckDB". See the comment about "data loaders". Seems like a "data loader" would provide most of what you're asking about.

I'm also thinking that a "data loader" combined with duckdb-wasm and arrow would be a pretty nice combination. I imagine that it might not be too difficult to switch two between two implementations of the "data loader" as needed. Switch between reading from a remote system (in your case DuckDB on a server) and DuckDB running locally in the browser (that can interact with its own remote or local data sources).

edit: welp https://observablehq.com/framework/lib/duckdb

1 comments

recifs 857 days ago

See the example at https://huggingface.co/spaces/observablehq/fpdn where DuckDB is used both as a data loader (to download and digest 200GB worth of source data into a small 8MB parquet file) and on the client-side to allow the user to do live search queries on the minimized data. Server-side, we're using duckdb-the-binary, and client-side we're using duckdb-wasm.

link

kuatroka 857 days ago

So the 200Gb loading and digesting part is totally separate from the Observable Framework, right? You just do it with a standard ( non wasm duckdb as part of ETL) and later you just direct Observable Framework to read and plot the 8Gb file? Thanks

link

severo_bo 857 days ago

nope, Observable Framework data loader accesses the 200GB dataset. The code is here: https://huggingface.co/spaces/observablehq/fpdn/blob/main/do...

link