Hacker News new | ask | show | jobs
by casion 1009 days ago
To give a shallow overview, transducers allow you to define steps in collection processing _per item_ rather than having collection processing as a series of transformations of collections.

So rather than processing the collection, passing it to the next function that processes the collection, passing it to the next... etc.. consuming all the CPU and memory that involves, you can define steps that are applied for each item in the collection thereby having the iteration through the collection happen once.

These steps (transducers) are also composable and reusable.

I suspect you know this, consider this a basic explanation for other people reading.

3 comments

That seems like insufficient magic for the respect that transducers seem to have.

What you've written is just the Haskell list monad or Java8 Stream.flatmap.

> That seems like insufficient magic for the respect that transducers seem to have.

This is 100% correct. It's amazing how over hyped they are.

If you have ever used C#'s LINQ you are using transducers. The fact that LINQ works item at a time instead of collection at a time is all that's being discussed. That if you say take the first two in a 1,000,000 long collection LINQ will only enumerate the first 2 items and not all 1,000,000 is the other behavior. And the way to do this is by composing operations using a "." operator into a "query" to run.

C# had had this since 2007. It doesn't require a fancy name, it doesn't require streams, it doesn't require "thought pieces" every few months for over a decade for people to "master thinking in linq". Just sequence your operations and get back to coding.

And Clojure is a really great language, but the amount of mental space occupied by transducers is a bad look for the language. Via analogy it's like watching people be amazed by for loops for over a decade. And I know they're are a lot of really smart people in the clojure community, so I honestly put it on Rich. Either on hyping or up so much when he released it like he just invented sliced bread and it's a deep advanced topic, or for how it's presented in the language that people have to understand so much beneath the abstraction layer to use it correctly.

Your average blub enterprise programmer has been using LINQ for 15 years and never needed 100 thought pieces in how to use it and reason about it. Yes it's a monad, yes it let's your short circuit, yes it's item at a time, but a user doesn't need to know lots of detail to use it. It's like watching a language community that can do calculus be continuously hypong on the fact it can do long division.

Clojure is an amazing language, transducers are not that special, figure out why they are so hyped in clojure.

Or maybe I have it backwards and linq/transducers really are partial differential equations and C# snuck it into the 4th grade curriculum and nobody noticed.

I think there's a big difference between transducers work and how other lazy stream libraries work. Not too familiar with LINQ but more with Java streams and Scala views and iterators. Usually these libraries wherever the work is forced, like in a final call of `.toList()` or whatnot, they are materializing something like a stack of iterators. Each iterator calls its child/children iterators and the pipeline logic is in the stack of `.next()` calls.

The difference with transducers is that the transformation logic is completely decoupled from the fact that this is happening in collections. That is what makes them usable outside of the context of collections, most famously in core.async where you can map/filter/mapcat/flatten/take/drop channels just like you do with collections. This only works because transducers are completely decoupled from either the fact that elements are coming from, or going into, a collection. I think it is a really fascinating and creative achievement in decoupling. Java Streams and Scala view/iterator/iterable transformations can never be reused in other contexts since they are fundamentally about the collections. Whatever kind of Observable/Async Stream/Future or other places where you might want map/filter/reduce functionality has to reimplement the whole set of these operations anew (see for example Akka Streams, RxJava)

I think these languages use “Collection” as another way of saying “iterable”. So the only thing such an object needs to provide is a “next()”. So a stack of lambdas/generators all working on a “next()” seem to be doing the same “per element” work of a Clojure transducer. Am I misunderstanding?
The difference is in the way that the transformations compose.

Iterators are generic over some iterable type `U` in `E, R, T: Iterator<E, R, U|T>, U: Iterable<E>` that they either directly or indirectly capture some nested Iterator.

However that means that code composing iterators must also be generic over that iterable `U`, so you can't simply take two transformation pipelines and concatenate them, because they will both provide an element source already.

Transducers separate the transformation part and the processing part. So you can have a `E, R, T: Transform<E, R>` which doesn't have a generic type parameter for the Iterable, and a function `transduce<E, R, S: Source<E>, I: Sink<R>, T: Transform<E, R>>(source: S, tx: T) -> I` which contains all the transformation application machinery, be it iterator style `next()` operations, or stream based `pull/push` operations, and a function `comp<E, R, S>(left: Transform<E, R>, right: Transform<R, S>) -> Transform<E, S>` that composes transformations. The way transformers are implemented, this `comp` operation is actually simply function composition.

Oh I see, so with transducers, you can use the same transform pipeline (stack of functions) across different composition contexts (iterable next(), function composition, channel transforming, etc). I see the utility now!

Still I think just the “next()” context is quite enough because you can have everyone use it by convention for everything, even things that aren’t collections. Like a fluent tensor library for example. This is based on a quick understanding of transducers…

> If you have ever used C#'s LINQ you are using transducers. The fact that LINQ works item at a time instead of collection at a time is all that's being discussed. That if you say take the first two in a 1,000,000 long collection LINQ will only enumerate the first 2 items and not all 1,000,000 is the other behavior. And the way to do this is by composing operations using a "." operator into a "query" to run.

Isn't that just function composition to build the map function? I guess that where the magic can come in is that function composition is associative, which allows for some really interesting opportunities for runtime optimization that, so far as I know, haven't been seriously explored.

Yup, it's essential function composition.

If IIRC technically it's a bit more. It's a monad just like function composition, but it's bind has to handle data threading and short-circuit behavior. If you squint it's not too differently than than a parser combinator over Applicative matching a sequence of characters. Composing operations to correct thread sequential data while handing short circuiting.

I'm not certain about the optimizations due to associativity. While yes function composition is associative, that just builds the query. Running the query itself must be sequential as each operation depends on the data from the prior, leaving I believe little room for optimization.

It’s been a while since I did any C#. How do I write a function that returns something that encapsulates the operations to perform on the stream that I can then compose with other operations? And can I then apply to whatever sequence abstraction I have? Isn’t it limited to things that implement IEnumerable? I seem to remember when transducers were introduced, one of the selling points was it wasn’t fixed to a specific sequence abstraction.
In fairness, it’s hard to picture a scenario where you couldn’t have an IEnumerable wherever you might want one.
Transducers do not require streams. You might want to learn a bit more about Clojure before poorly generalizing about this feature of the language. The OP did use streams to explore some specific applications of transducers. To say that they are overhyped is to be hung up on the buzz surrounding their initial release nearly 8 years ago.
Yes, and like most language features it's not about the feature, it's about having _that feature_ in a language with other benefits.

Think generics in Go or concurrency (effects) in OCAML or smart pointers in Rust. Not at all unique things, but having them in the language with other benefits is worth some discussion as it may provide extra leverage in context.

I think this opinion is less nuanced than it could be. Perhaps this will help? https://hypirion.com/musings/haskell-transducers
Surely the example that says these two would be equivalent is wrong:

     foldl (+) 0 (take n myList)

     reduce (takeNPlus 10) 0 myList
I assume it should be (takeNPlus n)? (or (take 10 myList))
If I understand you correctly, transduces transform something like

(map fn1 (map fn2 (map fn3 collection)))

into

(map (fn [x] (fn1 (fn2 (fn3 x)))) collection); Is that correct?

More idiomatic clojure:

(map (comp fn1 fn2 fn3) collection)

It’s the mapping itself that can be composed with transducers. Where before you had pipelines of map/filter/whatever, now you have a single function representing the sequence operations, which can be used for any kind of sequence (a list in memory, or items coming in over a channel or message queue) item by item.
That sort of reminds of broadcast fusion [1] in Julia.

Funnily enough, Julia is where I came across the term transducers too, via the Transducers.jl package [2]. The article and the comments here now make me wonder what the difference is, between broadcast fusion and transducers.

[1] https://julialang.org/blog/2017/01/moredots/ [2] https://github.com/JuliaFolds2/Transducers.jl/

Generally, yes, but I also think the fn1,fn2,fn3 can happen independently (which is why its powerful). So items of collection may be at different steps.
You have the idea, yes.
> To give a shallow overview, transducers allow you to define steps in collection processing _per item_ rather than having collection processing as a series of transformations of collections.

This is a much better and clearer explanation than the entire article.