Hacker News new | ask | show | jobs
by chrisjc 857 days ago
Came here with similar questions and Cmd-F "DuckDB". See the comment about "data loaders". Seems like a "data loader" would provide most of what you're asking about.

I'm also thinking that a "data loader" combined with duckdb-wasm and arrow would be a pretty nice combination. I imagine that it might not be too difficult to switch two between two implementations of the "data loader" as needed. Switch between reading from a remote system (in your case DuckDB on a server) and DuckDB running locally in the browser (that can interact with its own remote or local data sources).

edit: welp https://observablehq.com/framework/lib/duckdb

1 comments

See the example at https://huggingface.co/spaces/observablehq/fpdn where DuckDB is used both as a data loader (to download and digest 200GB worth of source data into a small 8MB parquet file) and on the client-side to allow the user to do live search queries on the minimized data. Server-side, we're using duckdb-the-binary, and client-side we're using duckdb-wasm.
So the 200Gb loading and digesting part is totally separate from the Observable Framework, right? You just do it with a standard ( non wasm duckdb as part of ETL) and later you just direct Observable Framework to read and plot the 8Gb file? Thanks
nope, Observable Framework data loader accesses the 200GB dataset. The code is here: https://huggingface.co/spaces/observablehq/fpdn/blob/main/do...