|
|
|
|
|
by pizza
1651 days ago
|
|
Do you have some examples of slicing, masking, un-padding, and (I suppose) “Haskell-like” ops, eg fmap, but also eg treemap, vmap, pmap that are in Jax? Also grouping, cutting, and interweaving, and also.. this is kind of a weird ask but suppose I had an operation in pure assembly that is extremely fast w two int64 parameters outputting one int64, what’s the easiest path for me to get awkward to apply that to two Arrays and give me one Array back as output? |
|
There are slice examples here, for all the different ways these arrays can be sliced: https://awkward-array.readthedocs.io/en/latest/_auto/ak.Arra...
That includes what I think you mean by "masking." (You mean keeping only the array elements that are `true` in a boolean array? There's another function we call ak.mask that keeps all array elements, but replaces the ones that line up with `false` with a missing value: https://awkward-array.readthedocs.io/en/latest/_auto/ak.mask...)
If you have irregular-length lists and you want to make them all the same length, that's padding, ak.pad_none: https://awkward-array.readthedocs.io/en/latest/_auto/ak.pad_... What's "un-padding"?
Mapping is implicit, as it is in NumPy. If you use an Awkward Array in any NumPy ufunc, including binary operators like `+`, `-`, `*`, `==`, `<`, `&`, etc., then all the arrays will be broadcasted and computed element-by-element. This is true whether the data structure is flat or a deep tree. ("Array-oriented" is a different style from "functional.")
There hasn't been much call for grouping yet—Awkward Array is more like a NumPy extension than a Pandas extension—but there is a way to do it by combining a few functions, which is described in the ak.run_lengths documentation: https://awkward-array.readthedocs.io/en/latest/_auto/ak.run_...
For wrapping a function with an ABI interface, I think the easiest way to do that would be to use Numba and ctypes.
Numba's @vectorize decorator (https://numba.pydata.org/numba-doc/latest/user/vectorize.htm...) makes a ufunc, and Awkward Array knows how to implicitly map ufuncs. (It is necessary to specify the signature in the @vectorize argument; otherwise, it won't be a true ufunc and Awkward won't recognize it.)When Numba's JIT encounters a ctypes function, it goes to the ABI source and inserts a function pointer in the LLVM IR that it's generating. Unfortunately, that means that there is function-pointer indirection on each call, and whether that matters depends on how long-running the function is. If you mean that your assembly function is 0.1 ns per call or something, then yes, that function-pointer indirection is going to be the bottleneck. If you mean that your assembly function is 1 μs per call and that's fast, given what it does, then I think it would be alright.
If you need to remove the function-pointer indirection and still run on Awkward Arrays, there are other things we can do, but they're more involved. Ping me in a GitHub Issue or Discussion on https://github.com/scikit-hep/awkward-1.0