Hacker News new | ask | show | jobs
by moelf 1831 days ago
the raising of numba shows us why numpy and "just write vectorize-styled code with a C++ backend" is not enough.

yet Numba basically makes your python code not python. It doesn't support so many things: pandas dataframe, or even as simple as a dict(), which means you often have to manually feed your numba function separate arguments.

To separate a complicated calculation into numba-infer-able parts and the not ones is not fun and sometimes just impossible.

3 comments

Yep, completely agree. For the project I'm currently doing, it seems like a fairly good fit though. Lots of prototyping different approximations, and needs to be faster than plain numpy.

Also, the jitclass things help somewhat. I use them as plain data containers, to work around the hideously long argument lists that otherwise would be required, but with no methods. jitclass breaks the GPU option, though.

I have personally experimented quite a lot with numba. When it works, it's great. However, numba can have very cryptic error messages making it difficult to debug.

Which is why I switched to dask, which even though slower integrates better with numpy.

Actually, Numba does support dicts now.(You can't have a mix of types in the dicts unless that's changed, but that isn't an actual problem for most ML work.) I have used numba very effectively to make my machine learning research projects run very quickly. I don't use pandas; I do use a lot of numpy and scipy. I understand that pandas can use numpy arrays for at least some things. Since numba works great with numpy, it seems like that might be an approach for using it with pandas, in at least some cases.