|
|
|
|
|
by platypii
409 days ago
|
|
Duckdb and datafusion are super cool! But they are VERY large wasm blobs (30-40mb each). This is often larger than the data you’re trying to load. And they add complexity with serving and deploying wasm files. Hyparquet is 10kb of pure js, and so its trivial to deploy on a modern webapp, and wins hands down on time-to-first-data metric. |
|
I don't know how to reconcile this with the emphasis in the page on interacting with datasets relevant to AI which are commonly several orders of magnitude larger than this. What's an AI problem where the data data involved has been less than 10s of mb? I think that only toy problems and datasets could plausibly be smaller (e.g. the training images for the classic MNIST dataset are 47MB, and the whole dataset is 55 https://www.kaggle.com/datasets/hojjatk/mnist-dataset?select... ).