Hacker News new | ask | show | jobs
by ogrisel 642 days ago
Note that NumPy, CuPy and PyTorch are all involved in the definition of a shared subset of their API:

https://data-apis.org/array-api/

So it's possible to write array API code that consumes arrays from any of those libraries and delegate computation to them without having to explicitly import any of them in your source code.

The only limitation for now is that PyTorch (and to some lower extent cupy as well) array API compliance is still incomplete and in practice one needs to go through this compatibility layer (hopefully temporarily):

https://data-apis.org/array-api-compat/

3 comments

It's interesting to see hardware/software/API co-development in practice again.

The last time I think this happen at market-scale was early 3d accelerator APIs? Glide/opengl/directx. Which has been a minute! (To a lesser extent CPU vectorization extensions)

Curious how much of Nvidia's successful strategy was driven by people who were there during that period.

Powerful first mover flywheel: build high performing hardware that allows you to define an API -> people write useful software that targets your API, because you have the highest performance -> GOTO 10 (because now more software is standardized on your API, so you can build even more performant hardware to optimize its operations)

An excellent example of Array API usage can be found in scikit-learn. Estimators written in NumPy are now operable on various backends courtesy of Array API compatible libraries such as CuPy and PyTorch.

https://scikit-learn.org/stable/modules/array_api.html

Disclosure: I'm a CuPy maintainer.

And of course the native Python solution is memoryview. If you need to inter-operate with libraries like numpy but you cannot import numpy, use memoryview. It is specifically for fast low-level access which is why it has more C documentation than Python documentation: https://docs.python.org/3/c-api/memoryview.html