Hacker News new | ask | show | jobs
by rickard 1830 days ago
I recently (last week) started using numba, for similar reasons to why the author seems to like Julia. I tested translating his example to numba:

  @numba.njit(parallel=True, fastmath=True)
  def w(M, a):
      n = len(a)
      for i in numba.prange(n):
          for j in range(n):
              M[i,j] = np.exp(1j*k * np.sqrt(a[i]**2 + a[j]**2))
and timed it like this:

  %%timeit
  n = len(a)
  M = np.zeros((n,n), dtype=complex)
  w(M, a)
On my 8-core system, this ends up more than 10x as fast as the numpy version he listed (which seems to lack the sqrt, though), which would place it close to the multithreaded Julia, even considering that ran it on a 4-core system. As an added bonus, it can also pretty much automatically translate to GPU as well.
1 comments

the raising of numba shows us why numpy and "just write vectorize-styled code with a C++ backend" is not enough.

yet Numba basically makes your python code not python. It doesn't support so many things: pandas dataframe, or even as simple as a dict(), which means you often have to manually feed your numba function separate arguments.

To separate a complicated calculation into numba-infer-able parts and the not ones is not fun and sometimes just impossible.

Yep, completely agree. For the project I'm currently doing, it seems like a fairly good fit though. Lots of prototyping different approximations, and needs to be faster than plain numpy.

Also, the jitclass things help somewhat. I use them as plain data containers, to work around the hideously long argument lists that otherwise would be required, but with no methods. jitclass breaks the GPU option, though.

I have personally experimented quite a lot with numba. When it works, it's great. However, numba can have very cryptic error messages making it difficult to debug.

Which is why I switched to dask, which even though slower integrates better with numpy.

Actually, Numba does support dicts now.(You can't have a mix of types in the dicts unless that's changed, but that isn't an actual problem for most ML work.) I have used numba very effectively to make my machine learning research projects run very quickly. I don't use pandas; I do use a lot of numpy and scipy. I understand that pandas can use numpy arrays for at least some things. Since numba works great with numpy, it seems like that might be an approach for using it with pandas, in at least some cases.