|
|
|
|
|
by devin-petersohn
1928 days ago
|
|
I'm one the maintainers of Modin, so I can chime in here. Dataframes are the focus of my PhD thesis, and Modin started as my PhD project. Most of the differences come down to functionality and support. Truthfully, the goals of the projects are quite different so it's a bit of apples-to-oranges. As a part of developing Modin, we identified a low-level algebra and data model that both generalizes and encompasses all of the pandas and R dataframe functionalities. Modin is an implementation of this data model and algebra[1]. Based on our studies, Vaex's architecture can support somewhere in the range of 35-40% of the pandas DataFrame API, including the exclusion of support for row indexes. Compare this to Dask, currently at 44% of the pandas API, and Modin, currently at 90%. Vaex is great if you're already working with a compatible memory-mapped file format; it'll be exceptionally fast in that case. That is the use case I believe they are (successfully) targeting. [1] https://arxiv.org/pdf/2001.00888 |
|