Hacker News new | ask | show | jobs
by pletnes 3058 days ago
That’s easy with python, too, in a lot of number crunching cases. Numpy with MKL will use all your cores, as will e.g dask and other libraries built on numpy. Farming out embarassingly parallel work to threads or processes is also easy.
2 comments

If I can fit the code into numpy-like structure, then Python is typically fine.

The issue is when I cannot.

Then move the function to a pyx file and build it with Cython. Problem solved.
Also look into numba as a jit decorator for python functions.
Have you given dask a try? It gives you out-of-core arrays with numpy semantics and distributed computing.
Dask doesn't solve that problem since it's a wrapper around pandas functions.

If you can't make the core pandas code decently fast, dask won't save you.

dask.dataframe might not help but dask.distributed could in that case.

I've had success using it on non vanilla stuff (i.e. code that could not get converted to play natively with numpy/pandas structures)

As a bonus, the nice profiling tools (built within dask) have also helped me improve the performance of the code.

See https://distributed.readthedocs.io/en/latest/

There’s dask.array which works on numpy arrays instead of dataframes. Otherwise, your argument holds.
Or use languages where you don't have to these extreme workarounds for what should happen by default.