|
|
|
|
|
by jpivarski
1652 days ago
|
|
Un-padding: something like string-trimming (e.g. `str,rstrip`), but for missing values at the ends of lists... There isn't a function for that. If you happen to know that the only uses of missing values are at the ends of lists, `ak.is_none` and `ak.sum` (with the appropriate `axis`) can count them, and you could perhaps construct a slice from that (negative to count from the end, and therefore slice off the missing values only). I'd have to think about it, but that would be the beginning of a columnar implementation of "unpad_none". As for the algebraic types, I was using the terminology to explain what the reducers do. Some operations, like sum and product, have identities, and some don't, like argmin. As for type annotations, I don't know what you mean. We're not using Python type annotations, but they'd be too coarse to describe what these operations do. Awkward-specialized type annotations might be overkill. For Dask, which needs to be able to predict types, we're passing tracer objects through the codebase to observe the types change without actually computing values, so it's a type-propagation by execution. |
|
So I was wondering how I could exploit Awkward’s typing system to use/implement some goodies from Haskell a la https://wiki.haskell.org/Typeclassopedia
Like, for instance, what if I could make an array of heterogenous ufuncs, and apply that to a similarly shaped array (like an Applicative).. like if I wanted to implement eg graph re-writing by applying a rules ufunc array to an adjancency array, etc, or even , to get very meta, apply a rules function array to another rules function array
Or if I wanted to compute eg the fixed point of a series of those applications, etc.
Or maybe if I wanted to use Arrow types to abstractly represent computations within each cell, do some fancy stuff in each cell, perform some rudimentary ’compiler optimization’ by inspecting which cells would end up doing unnecessary work (in the context of whatever problem I am doing; eg suppose I only permitted 3 chained ufunc calls per cell or something weird like that), that would be really cool too
Or eg if for some unknown reason I wanted each cell to fire off 2 concurrent ufuncs within each cell, and I only was interested in the result that ‘won’ the data race for each cell, I could use eg an Alternative in the style of the Concurrently library.
Or if I wanted eg each cell to be like a MonadPlus; do some work in the cell but also provide builtin “recovery” capabilities per cell if the cell evaluated to empty/missing/None
Ah now another interesting possibility could be a matrix of lambda calculus statements..!
Musings and sketches.. :)
Very very cool work indeed!