|
|
|
|
|
by musingsole
1934 days ago
|
|
If you're able to comfortably do your processing in Pandas, I don't think there is any justification to switch to Vaex. But Pandas begins to strain in the GB territory. If you switch to Vaex at that point, it'll be night and day. Working from the REPL, no more half second pauses for results. And of course the payoff only grows with more data. Vaex is stupid fast at all the data operations it supports to the point where I've used in it in place of a database for an API. |
|
Indeed, for small data there is not much to gain, at least this is not the focus of this article. Although even with small amounts of the, the automatic pipelines are useful https://vaex.io/blog/ml-impossible-train-a-1-billion-sample-...