|
|
|
|
|
by makmanalp
3118 days ago
|
|
> pandas is absolutely terrible compared to the dplyr, data.table, or even base R for data manipulation. I would really like to hear a bit more about this, because this would greatly increase my motivation to learn more R. Specifically I've fiddled around with dplyr and it definitely feels more DSL-y but I didn't see a crazy benefit there. What are some of your favourite things about dplyr / data.table? |
|
data.table's advantage lies in its speed. It is by far the fastest of the three options. In just about every benchmark it either is significantly faster than pandas or at the very least is approximately equal.
Pandas is lauded by people who strictly use Python, and it really is fantastic considering how ridiculous data manipulation would be in Python without it. But its also the only option a Python user really has, so they've become married to the idea that it is best.
Basically, if you are using Python, use pandas. If you have an option, go for data.table for speed, dplyr for clarity, or a mix of the two if desired.