Hacker News new | ask | show | jobs
by nuisance-bear 1895 days ago
Tools to make GPU development easier are sorely needed.

I foolishly built an options pricing engine on top of PyTorch, thinking "oooh, it's a fast array library that supports CUDA transparently". Only to find out that array indexing is 100x slower than numpy.

3 comments

You might be interested in Legate [1]. It supports the NumPy interface as a drop-in replacement, supports GPUs and also distributed machines. And you can see for yourself their performance results; they're not far off from hand-tuned MPI.

[1]: https://github.com/nv-legate/legate.numpy

Disclaimer: I work on the library Legate uses for distributed computing, but otherwise have no connection.

Interesting find about the indexing. I just had the opposite experience, swapped from numpy to torch in a project and got 2000x speedup on some indexing and basic maths wrapped in autodiff. And I haven't moved it onto cuda yet.
Here's an example that illustrates the phenomenon. If memory serves me right, index latency is superlinear in dimension count.

   import time, torch
   from itertools import product

   N = 100

   ten = torch.randn(N,N,N)
   arr = ten.numpy()

   def indexTimer(val):
       start = time.time()
       for i,j,k in product(range(N), range(N), range(N)):
           x = val[i, j, k]
       end = time.time()
       print('{:.2f}'.format(end-start))

   indexTimer(ten)
   indexTimer(arr)
>>> built an options pricing engine on top of PyTorch

I'd love to hear more about this! Do you have any posts or write-ups on this?