Hacker News new | ask | show | jobs
by oddthink 4857 days ago
Aside from the performance differences, data.table makes it very easy to do interactive manipulation, at the cost of making it hard to program. Pandas currently goes in the opposite direction.

I'd rather have R/data.table at the prompt and python/pandas in my script, but if you have to err on one side, the python/pandas "low magic" is the side to err on. Pandas does have its own strange corners, though. For example, it seems like it tries hard to stick similar-typed columns into contiguous matrices, which leads to some unexpected casting, and I have no idea what the supposed benefit is over just keeping distinct columns.

1 comments

I'd guess the benefits are related to performance - Wes is known as something of a speed junkie (see also his vbench project). I know there's quite a bit of code in pandas that makes it much faster than a naive implementation of a similar interface.

That said, if it causes unexpected behaviour, check to see whether it's a bug.