| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by lmeyerov 1337 days ago

agreed, reading this article was confusing, the python baseline is far from our reality

for reference, we're aiming for 1-100 GB / second, per server, in our python etl+ml+viz pipelines

interestingly, duckdb+polars are nice for small non-etl/ml perf, but once it's analytical processing, we use cudf / dask_cudf for much more perf per watt / $. I'd love the low overhead & typing benefits of polars, but as soon as you start looking at GB+/s and occasional bigger-than-memory, the core sw+hw needs to change a bit, end-to-end

(and if folks are into graph-based investigations, we're hiring backend/infra :) )