Hacker News new | ask | show | jobs
by qwhelan 2355 days ago
FWIW it looks like pandas is slow/OOM-ing because the benchmarks solely use Categoricals, which aren't as heavily used by pandas users compared to R.

In particular, I suspect the benchmark sizing is forcing falling back from numpy's int64 to Python ints as categorical labels, which easily could explain a 10x or more differential.