|
|
|
|
|
by sammysidhu
1111 days ago
|
|
Hi! (one of the Daft maintainers here), thanks for the feedback. Ultimately you're right that supporting the full Polars syntax in a distributed fashion is very difficult. There are libraries out there that do "Pandas but distributed" but from what I have seen is that they prioritized API coverage rather than performance or memory consumption. So you end up in a similar boat to the situation you mentioned. We're trying to start with a simpler API that maps well to a distributed query query that we can execute well and then add the features that people request for. I would love to know what you would want to see in Daft! |
|
> We're trying to start with a simpler API that maps well to a distributed query query that we can execute well and then add the features that people request for.
That would have been a good approach on a field that has not been standardised around a single library since its infancy. Polars is beating Pandas in every possible benchmark, yet will continue to struggle for adoption "until the end". Do you really think Daft can do better ? (If yes, go ahaid, and prove me wrong !)
As a comparison, it's like trying to introduce a new transport layer protocol (https://en.wikipedia.org/wiki/QUIC) against TCP. You can do that if and only if there are obvious benefits, no drawbacks and you are prepared to wait 15 years for 30% market share.