Hacker News new | ask | show | jobs
by wodenokoto 1934 days ago
It seems like it has some nice advanced features that the data engineering team might appreciate once an application gets large.

But as the person who needs to load up some data and do some transformations, this article gives me very little information about why I should switch from pandas.

But I am excited to hear about new solutions in the data frame space!

1 comments

If you're able to comfortably do your processing in Pandas, I don't think there is any justification to switch to Vaex. But Pandas begins to strain in the GB territory. If you switch to Vaex at that point, it'll be night and day. Working from the REPL, no more half second pauses for results. And of course the payoff only grows with more data.

Vaex is stupid fast at all the data operations it supports to the point where I've used in it in place of a database for an API.

Thanks, glad you find Vaex useful.

Indeed, for small data there is not much to gain, at least this is not the focus of this article. Although even with small amounts of the, the automatic pipelines are useful https://vaex.io/blog/ml-impossible-train-a-1-billion-sample-...

I was more concerned about its api / methods.

Does it make things hard that was easy in pandas or does it make things that are hard in pandas easy?

I'm coming from a Pandas dominated codebase. Working with Vaex, I felt the interface was almost 1 for 1. I have a note from then about joins being more awkward than with Pandas. If I recall that is more that Pandas' joins have more flexibility, but that most of the functionality was there.

At the time, I had issues with some string operations, though it appears with v4.0 that may no longer be the case.