|
Okay, that's a lot of questions. There are slice examples here, for all the different ways these arrays can be sliced: https://awkward-array.readthedocs.io/en/latest/_auto/ak.Arra... That includes what I think you mean by "masking." (You mean keeping only the array elements that are `true` in a boolean array? There's another function we call ak.mask that keeps all array elements, but replaces the ones that line up with `false` with a missing value: https://awkward-array.readthedocs.io/en/latest/_auto/ak.mask...) If you have irregular-length lists and you want to make them all the same length, that's padding, ak.pad_none: https://awkward-array.readthedocs.io/en/latest/_auto/ak.pad_... What's "un-padding"? Mapping is implicit, as it is in NumPy. If you use an Awkward Array in any NumPy ufunc, including binary operators like `+`, `-`, `*`, `==`, `<`, `&`, etc., then all the arrays will be broadcasted and computed element-by-element. This is true whether the data structure is flat or a deep tree. ("Array-oriented" is a different style from "functional.") There hasn't been much call for grouping yet—Awkward Array is more like a NumPy extension than a Pandas extension—but there is a way to do it by combining a few functions, which is described in the ak.run_lengths documentation: https://awkward-array.readthedocs.io/en/latest/_auto/ak.run_... For wrapping a function with an ABI interface, I think the easiest way to do that would be to use Numba and ctypes. import ctypes
import awkward as ak
import numba as nb
libm = ctypes.cdll.LoadLibrary("/lib/x86_64-linux-gnu/libm.so.6")
libm_exp = libm.exp
libm_exp.argtypes = (ctypes.c_double,)
libm_exp.restype = ctypes.c_double
libm_exp(0) # 1.0
libm_exp(1) # 2.718281828459045
libm_exp(10) # 22026.465794806718
@nb.vectorize([nb.float64(nb.float64)])
def ufunc_exp(x):
return libm_exp(x)
array = ak.Array([[0.0, 1.1, 2.2], [], [3.3, 4.4], [5.5], [6.6, 7.7, 8.8, 9.9]])
ufunc_exp(array) # calls libm_exp on every value in array, returns an array of the same structure
Numba's @vectorize decorator (https://numba.pydata.org/numba-doc/latest/user/vectorize.htm...) makes a ufunc, and Awkward Array knows how to implicitly map ufuncs. (It is necessary to specify the signature in the @vectorize argument; otherwise, it won't be a true ufunc and Awkward won't recognize it.)When Numba's JIT encounters a ctypes function, it goes to the ABI source and inserts a function pointer in the LLVM IR that it's generating. Unfortunately, that means that there is function-pointer indirection on each call, and whether that matters depends on how long-running the function is. If you mean that your assembly function is 0.1 ns per call or something, then yes, that function-pointer indirection is going to be the bottleneck. If you mean that your assembly function is 1 μs per call and that's fast, given what it does, then I think it would be alright. If you need to remove the function-pointer indirection and still run on Awkward Arrays, there are other things we can do, but they're more involved. Ping me in a GitHub Issue or Discussion on https://github.com/scikit-hep/awkward-1.0 |
Oh and for un-padding, I meant like how do I do the inverse of fill_none . pad_none
Also saw there was some stuff about algebraic types (eg semigroup reductions) - is that kind of algorithm-level type annotation a direction you all are interested in exploring further?