Hacker News new | ask | show | jobs
by luispedrocoelho 3057 days ago
If I can fit the code into numpy-like structure, then Python is typically fine.

The issue is when I cannot.

3 comments

Then move the function to a pyx file and build it with Cython. Problem solved.
Also look into numba as a jit decorator for python functions.
Have you given dask a try? It gives you out-of-core arrays with numpy semantics and distributed computing.
Dask doesn't solve that problem since it's a wrapper around pandas functions.

If you can't make the core pandas code decently fast, dask won't save you.

dask.dataframe might not help but dask.distributed could in that case.

I've had success using it on non vanilla stuff (i.e. code that could not get converted to play natively with numpy/pandas structures)

As a bonus, the nice profiling tools (built within dask) have also helped me improve the performance of the code.

See https://distributed.readthedocs.io/en/latest/

There’s dask.array which works on numpy arrays instead of dataframes. Otherwise, your argument holds.