Hacker News new | ask | show | jobs
by 27182818284 4013 days ago
>Transducers are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element. Because transducers are decoupled from input or output sources,

The biggest thing holding me back from learning Clojure is that I fear it will take me a decade to become remotely competent in it.

18 comments

If you have an understanding of functions like map, filter, and reduce, transducers are actually pretty easy.

Say you have `(map inc [1 2])`. You can run that, and get `'(2 3)`.

A transducer is the `(map inc)` part of that call (slightly confusingly, this isn't partial application or currying). You can apply it to something like `[1 2]`, but you can also compose with it, by combining it with say, `(filter even?)` to get something that represents the process of incrementing everything, then removing odd numbers. Or you can put in things that aren't collections, like asynchronous channels, and get back a new channel with the values modified accordingly.

That's pretty much it.

What I think I love most about Clojure is that there are fantastic, esoteric, academic ideas that, when I read about them in a context like this for the first time, I a) do not understand them, and b) have no idea how they would be useful. Then I read an example or two, and suddenly it's apparent that the tool is really as simple as it can be--there's very little accidental complexity--and is extremely useful.

The way you explain it, it's no different from functions and function composition; in which case, why invent new vocabulary?

I do remember looking into them before and translating them into Haskell and they ended up not being identical to functions in the trivial sense that you suggest, but I forget how.

Transducers are functions. The thing is that they are functions that are designed to serve as the functional argument to reduce. And they pair with ordinary functions which are not transducers.

For instance if we have (map inc [1 2]), there exists a transducer function T such that:

  (reduce T [1 2])  ==   (map inc [1 2])
I.e. we can somehow do "map inc" using reduce.

Okay?

Now, the clever thing is this: why don't we allow map to be called without the list argument? Just let it be called like this:

  (map inc)
This looks like partial application, right? Now what would partial application do? It would curry the "inc", returning a function of one argument that takes a list, i.e.:

  ;; if it were partial application, then:
  ((map inc) [1 2])  ;; same as (map inc [1 2])
But Hickey did something clever; he overloaded functions like map so that (map inc) returns T!

  (reduce (map inc) [1 2]) ;; same as (map inc [1 2])
The cool thing is that this (map inc) composes with other functions of its kind. So you can compose together the transducers of list processing operations, which are then put into effect inside a single reduce, and the behavior is like the composition of the original list processors.

It's like a linear operator; like LaPlace. Composition of entire list operations in the regular domain corresponds to composition of operations on individual elements in the "t-domain".

> (reduce (map inc) [1 2]) ;; same as (map inc [1 2])

This is wrong. You've missed the point.

    (map inc [1 2]) 
is actually roughly equivalent to

    (reduce ((map inc) conj) [] [1 2])
which, due to the use of `reduce`, is eager. To get laziness back:

    (sequence (map inc) [1 2])
Transducers are not reducing functions, they return reducing functions when applied to reducing functions. `((map inc) conj)` is a version of `conj` that calls `inc` on all the rhs args before `conj`ing them into the lhs arg.
I suspected I had to be not quite understanding something, because why would we want, say, a map based on reduce that chokes on lazy lists, in a language where laziness is important.
Why have new vocabulary for the state monad?

Transducers are mostly function and function composition, but with a specific signature. There are a handful of contracts on a transducer, so it is useful to have a name, so you can say "this function takes a transducer" and "this function returns a transducer".

Older collection functions were semi-lazy by default. They auto-realized in chunks of 32. Transducers wrap the functions then pass the data through, allowing full-laziness (or not if you want) by default.

They also work well in cases where you don't know if/when a new value is coming, like in channels or observables. This is because they aren't necessarily wed to the seq abstraction from the get-go.

I think people are seriously underestimating the protocol involved and understanding that protocol is required if you want to build your own transducers compatible streaming, which is always useful. There's also the issue that the protocol in question may not be generic enough. From the presentations I've seen it claims that it might work with Rx streams as well, however I don't think it can deal with back-pressure. This is just a personal opinion and I'd like to be proven wrong.

That said the concept is pretty cool and we shouldn't fear learning new things. After all, the whole point of learning a new programming language is to be exposed to new ways of solving problems, otherwise why bother?

Adding to weavejester comment, back-pressure is to be handled with core.async channels [1]. I guess in Rich's terms, RX/FRP "complects" the communication of messages with flow of control [2]. Although this last statement may not be true anymore given that RX now has many functions to control the scheduler (eg. observeOn/buffer/delay, etc).

[1] http://clojure.com/blog/2013/06/28/clojure-core-async-channe... [2] http://stackoverflow.com/questions/20632512/comparing-core-a...

Transducers are, I believe, largely orthogonal to back-pressure concerns. They describe only how transform the data at each step.
That's not correct, because if they described only how to transform the data at each step, then you couldn't describe `take` or `flatMap`.
Because Clojure isn't a pure functional language, transducers may be stateful. `take` uses a volatile (i.e. a fast, mutable variable) to retain state. I don't believe a `flatMap` transducer exists in Clojure yet.
Don't worry too much if you don't get the esoteric stuff in the beginning. The fact that it's there just means that there's room to grow. The core language is actually very simple and quite straightforward. It just allows for some quite mind-boggling stuff.

Try Carin Meier's Living Clojure if you feel up for an intro.

And then check out the new Clojure Applied (by me) which aims to be a good intermediate level intro. https://pragprog.com/book/vmclojeco/clojure-applied
Clojure Applied is an excellent book so far. I've been using Clojure for a long time, and it's still been very helpful in updating idioms for modeling domains, composing applications, etc.

I'd say it's required reading for working Clojure programmers, of any experience level. We really needed a "how to build applications in Clojure" to complement all the "how to do stuff in Clojure" books that are already out.

+1 I looked at your table of contents: nice topic selection.

The free excerpt was a good read. You might edit your comment to provide a direct link to that.

2nd for Living Clojure. Great crash course in Clojure for working developers.
Might as well check out Joy of Clojure by Michael Fogus also, really good book.
Transducers create functions that are meant to be passed to reduce (i.e. take accumulator and element and return a new accumulator).

Its possible to implement most other functions (map, filter, take etc.) using reduce. Transducers take advantage of that.

Unfortunately, if you try to do that, you'll notice that the implementation is sometimes tied with the way you build the entity: e.g. for vectors, map implemented in terms of reduce would start with an empty vector then append to it.

That sucks. We want operation chains that are independent of the data structure they operate on. We want them to work on vectors, lists, channels, whatever - anything that can have a reduce-like operation (anything reducible)

However, it turns out you don't necessarily have to recreate the entity (e.g. list) when chaining. All you need to know how to do is invoke the next operation in the chain, i.e. the next reduction function. For example:

* map can apply the function on the element to get a new result, then call the next reduction function with the accumulator and that new result.

* filter can check the element and either return the old accumulator or apply the next reduction function with the element, etc.

Therefore, transducers simply take an additional argument: the next reduction function that should be applied. This lets you build up a chain of operations (e.g. map filter etc) that doesn't care about the kind of entity they're operating on. Only the last "step" must be a reduction function that needs to know how to build the result.

Actually the learning curve is very gentle, and you can gradually expand from basic features into the more advanced stuff when ready. Many of the Clojure features that have sophisticated underlying design principles (like transducers) can be used cookbook style from easy-to-understandable examples. Jump in, the water's warm.
Clojure is one of the easiest languages to learn. You need to know very few concepts to become productive in it. I get co-op students very 4 months at work and they generally start doing useful stuff within a week or so.

I wrote a conceptual starter guide a little while back ( https://yogthos.github.io/ClojureDistilled.html ), that covers all the basics that you need to know to get up and running.

You don't need to learn every feature to be competent in the language. Some features are advanced features that are not required in day-to-day programming (eg, transients, you can go forever without ever touching them, but if you need performance they're there). A lot of the advanced things are of most interest to library authors.
You may feel less intimidated by "Clojure for the Brave and True." The landing page features a rodent with an eye patch and a sword.

http://www.braveclojure.com/

> Delete ~/.emacs or ~/.emacs.d if they exist

Ehhhh, no!

Hence the emphasis on Clojure for the Brave.
Don't get disheartened. I just started this tutorial last weekend (and had no emacs experience before) and it's been great!
honestly, I use Clojure 1.7 for work every day and I don't even know what a transducer is.

It's easy to overstate the necessity of advanced features, ones very few programmers probably actually use.

Transducers are part of cultural drive of the Clojure community to identify common patterns and simplify design. Rich's presentations do a nice job of explaining how a transducer is about separating [how to do the work] from [where the work is done]. By "work" I mean mapping, filtering, reducing, and so on. By "where the work is done", I mean that a transducer doesn't care what kind of data structure it operates on.

If you keep this in mind, perhaps re-reading http://clojure.org/transducers won't be as intimidating.

As a general comment, to those new to Clojure, when you find something in Clojure that seems strange or different, I encourage you to ask "What does [function X] care about? (e.g. need to know)" and "What does [function X] not need to know?". This relentless drive to simplify design and responsibilities mean that functions are small and "opinionated", but in ways that are driven by constraints, not arbitrary decisions. So these choices make a lot of sense -- I'd argue they flow pretty naturally.

Don't get me wrong; I'm not knocking it. I definitely see the value in these kinds of features.

Like zippers, I'm sure it's another case of "If you understand it, it's really useful."

The point is, you don't necessarily need it to get the job done, and you certainly don't need to be intimidated about it or feel obligated to learn it right off.

Even Haskell isn't actually so difficult to grasp the basics of, if you keep this in mind and just think "Do IO in a 'do' block" and go about your way. See [1].

You don't have to know everything in any language just to get some work done, a lot of times advanced features are just that: advanced, stuff for doing things a bit more efficiently, or to handle certain rough edge cases. Learn them in their own time, and they'll make you better at what you do, but don't get wrapped up too much in expecting perfect efficiency from yourself.

A lot of programmers seem to be, as a people, kinda bad at this kind of self-reflection, like we're all sheep in wolves' clothing trying to avoid showing a hint of weakness. I know I sure am. There's nothing wrong with not knowing something, just try and take a moment learn it when you can.

[1] http://blog.jle.im/entry/io-monad-considered-harmful

No need. Getting started, and to a level where you can be productive, is very easy. LISP is a very simple language, and lots of the features of Clojure mean that it's simple in practice too. That's compared to Haskell and Scala, which do require a little theory in order to get started.

IMHO.

There will perhaps come a time, where you want to know about and transducers, but by that time the concepts won't seem intimidating.

This is only slightly complicated if you're writing your own transducer operation.

What is interesting is that these transducing operations can be written quite generically - code reuse is huge.

But, usage of transducers is similar to Java 8 Streams, or Haskell stream fusion, only that the implementation is fully generic, not dependent on the container/source.

That's not a very plain-language description of transducers. Transducers are functions which can be applied to any sort of collection or sequence. You can use then to do something like filter a collection or an input stream with the same code. They are basically just a way to make things like 'map' and 'filter' work on datatypes other than sequences, such as streams or channels.

http://blog.cognitect.com/blog/2014/8/6/transducers-are-comi...

Again, that's not true. Transducers are about reducing things. That's why a transducer first needs a reducing function. But I won't go into details what trasnducers are. There is many good (and long) blog posts about it.
No, reducers are about reducing things. Transducers are about creating reducers that can be used on arbitrary collection-like things.

Reducers come from the observation that many higher order operations on collections can be built on reduce. You just supply the right reducing function and you get the equivalent to 'map' or 'filter'. This is useful because when you start chaining these higher order functions you can gain performance be replacing the multiple function calls with a single reducer based function.

To take the example from the main docs:

    (fold + (filter even? (map inc [1 1 1 2])))
Here, each function returns a reducer which is a combination of the original collection ([1 1 1 2]) and a reducing function which will be applied to the collection. Ultimately this code will result in a single call to reduce with a single function applied to [1 1 1 2]. This differs from the 'standard' way of doing this:

    (reduce + (filter even? (map inc [1 1 1 2])))
...in that no intermediary representations of the collection are necessary. Neat.

A transducer takes the same idea, but does it in a way that lets you apply this to arbitrary collection-like things, not just seqs. A transducer works by cutting out the original collection; you build the reducing function by chaining transducers together and pass it later another function which will pick apart the collection-like thing.

So:

    (into [] (r/filter even? (r/map inc [1 1 1 2])))
Becomes

    (into [] (comp (filter even?) (map inc)) [1 1 1 2])
Which is not exciting until you realize that that middle term can be passed to 'chan' al la:

    (chan 1 (comp (filter even?) (map inc)))
Which means that everything going through that channel will increased by filtered with 'even?'. Now you have a suite of functions which will take a transducer and use it in a lot of different contexts allowing you use the same logic on streams, sequences, channels etc. This same logic, that you would originally have expressed with a series calls to 'map' 'filter' 'take' and applied only to a sequence.
Clojure is sophisticated [0] but it is hard boiled down to necessary complexity. This means that ordinary language features like interop [1] are well sugared and more abstract concepts like transducers are simple to implement and map well to their description [2]. Stuart Halloway's Programming Clojure captures this idea.

[0] and big for practical purposes like Common Lisp.

[1] you can get at all of Java perhaps better than you can from Java using the REPL.

[2] like lexical scope and continuations in Scheme.

I had a similar feeling when I first starting using Clojure, but I found it was more of a discomfort from discussing programming language with a direct and accurate vocabulary. After a while I've become much more comfortable reading about and watching videos on programming languages in this style and I really enjoy being able to discuss and think about design decisions in a straightforward way.
Basic competence comes pretty quick, it's a small core language, compared to most. Most things like transducers you can ignore for a long time. The longest part of learning it for me was learning to think functionally, but you can do that in pieces, and nearly all of what you learn is applicable to any functional language, or even imperative languages with functional aspects.
It is one of the simplest and easiest languages around.

The addition of transducers is an unfortunate case of Clojure "pulling a Haskell", valuing an elegant abstraction over ease of understanding and learning. Indeed, your comment alone shows that doing them (and especially giving them a high profile) was a mistake. Just because you can abstract something elegantly doesn't mean you should. No beautiful abstraction is worth scaring people away. Fortunately, Clojure doesn't make many such mistakes, and it usually tries to err on the side of pragmatism. I hope transducers aren't the beginning of a trend.

But just don't use transducers until you feel comfortable enough with the language. They're not an essential feature.

I don't understand this argument. Clojure shouldn't have transducers because the word sounds scary? Programming language designers should avoid adding powerful, higher-order abstractions because they are hard to understand? This sentiment is incredibly anti-intellectual. And like you said, if you don't understand it you don't have to use it.
Language design requires tradeoffs: the power of a feature, vs. how much more complex it makes the language to read/learn/etc. Making something easy to use doesn't commit one to anti-intellectualism; it just shifts the field of intellectual challenge elsewhere.

See Yegge's Perl essays[1] for examples of core-language features that are 'powerful' but problematic; hopefully that convinces you that at least the form of argument is legitimate. It could be that those Perl features are bad whereas transducers are great. I happen to share some skepticism of transducers because they overly-resemble existing features (partial application) and can be implemented pretty easy with existing Clojure tools. Also the marquee use case (sharing transformations between sequences and channels) so far seems exceedingly rare to me, though maybe things are moving in a direction where that's more important.

> And like you said, if you don't understand it you don't have to use it.

True to an extent, but (a) the high prominence given to the feature may well lead to a situation where you can't read other peoples' code without learning the concept, and (b) it's easy to create transducers by accident.

[1] e.g. ".." in https://sites.google.com/site/steveyegge2/ancient-languages-...

Everyone who is learning Functional programming should be able to use map/reduce well. With that, understanding transducers is just natural.
Most languages with map/reduce don't have transducers (and they're certainly not central features). Also, don't forget that while Clojure is functional, it is also very much imperative (it's a functional-imperative language rather than a pure-functional language).
> Most languages with map/reduce don't have transducers

That's true but mostly irrelevant to the point that transducer usage isn't complex once one understands map/reduce, which are common, whether or not transducers are.

> This sentiment is incredibly anti-intellectual.

Yeah, he keeps trolling various discussions with his "Java is the best thing ever" chants.

> Programming language designers should avoid adding powerful, higher-order abstractions because they are hard to understand?

In general? Absolutely![1] Making algorithms easier for humans to understand is the whole purpose of abstractions.

Programming is based two things: algorithms and abstractions, with algorithms being fundamental to computation and abstractions are usability features designed to help people write code -- that is their one and only purpose (computers and even theoretical computational models don't care about abstractions). Unlike algorithms, abstractions are not useful in isolation, and their utility is not a function of mathematical "power" but of psychological benefit. Their utility is measured by how much they help the human programmer write and read an algorithm[2]. Another way to look at it is that algorithms tackle essential complexity and abstractions tackle accidental complexity[3].

More specifically, abstractions help human programmers in two ways: they increase code reuse and improve code readability.

But here's the thing. There are things other than powerful abstractions that help humans program -- for example, a clear execution model (what happens when), debuggability etc.. This means that the more powerful (i.e. abstract) abstractions become they do not necessarily perform their function -- namely, assisting developers -- better. So every abstraction must be carefully weighed: how much wasted code does it save, how much more readable it makes code, vs. how much does it hurt understanding or debuggability.

I'd say transducers are just about the point where the abstraction starts hurting more than it helps. It's a borderline case. Now, I am not saying Clojure shouldn't have transducers (again, borderline), but that they most certainly shouldn't be emphasized.

> if you don't understand it you don't have to use it.

That doesn't quite work. I said, don't use them right away. Once a programming language and its libraries use an abstraction, you must learn it sooner or later. After all, most code you'll read isn't your own. This is why every language feature has a cost, and why good language developers are hesitant about introducing new features (transducers aren't a language feature, but they do have a prominent place in the standard library).

---

[1]: For example, Java and Go are both languages whose designers intentionally and radically reduced the use of many of the abstractions available in other popular languages of their time. Java in particular drastically removed abstractions possible by the most popular language at the time it was introduced, and attained tremendous success, partly because of that (well, at the time it was already apparent that C++'s power -- in addition to its lack of safety -- was greatly detrimental to code maintenance in most parts of the industry).

[2]: Abstractions are secondary to algorithms. Also note how almost all sub-disciplines of computer science deal with algorithms, and just one, rather small, discipline -- PL research -- is concerned with abstractions.

[3]: Even algorithms are often not judged in isolation; there are lots and lots of useful algorithms that aren't used because they are too hard to implement and maintain correctly -- regardless of abstractions used.

For example, Java and Go are both languages whose designers intentionally and radically reduced the use of many of the abstractions available in other popular languages of their time.

Java is a perfect example of proliferation of accidental complexity caused by the unwillingness to provide facilities for composing abstractions in the core language. The Java community has resorted to massive external XML-based configuration files to provide code reuse where it is impossible to achieve inside the language.

The whole point of Lisp (and Clojure is a Lisp) is the power to compose functions and lists together, building higher and higher-level abstractions to allow concise expression of logic to solve a problem. The point of Lisp is to be expressive and powerful, not popular and readable. BASIC and COBOL were popular and readable.

> Java is a perfect example of proliferation of accidental complexity caused by the unwillingness to provide facilities for composing abstractions in the core language.

Well, those are all tradeoffs, and the fact is that since the addition of annotations and later lambdas, all those "external XML-based configuration files" are receding, to the point they no longer exist in almost any of the newer libraries (or new versions of old libraries).

Java didn't start out with insufficient abstractions that were later added by other languages. Java started out as a reaction to languages with overly-powerful abstractions that hindered maintenance. You may not like the result and think it aesthetically unpleasing (although it's been getting better and better for quite a while now), but it is a fact that Java codebases are extremely maintainable. This is not a guess or a gut feeling. Those legacy Java codebases exist as a living proof of that. Other languages legacy code was either thrown away or frozen, unmaintained. And if you think "good, codebases shouldn't live for too long", well, that's a nice sentiment, but the fact is that long lived codebases save the industry a lot of time and money (even if young developers think they could have done it better if they'd just started from scratch). Again, we know that because we've seen the alternative.

BASIC and COBOL were never nearly as maintainable as Java (I know becas. I love Lisp (Scheme was among my first languages, and Clojure is my favorite application-programming language), but the point of a language designed for the industry shouldn't be expressiveness or power, nor readability, but usefulness and maintainability. Any other property should serve that. Professional software is written to serve a purpose -- they're not pieces of art (or not just pieces of art). You must remember that the average lifespan of a codebase is about a decade, and the cost of the project is spread -- unevenly -- over that decade. A language for the industry -- as well as other related tools -- is meant to reduce that cost.

I think Java does an excellent job of that, and I believe Clojure can do an excellent job -- we just don't have the data yet. But if you design a language for the industry, you must always look at the big picture -- at those ten years of the codebase as a goal -- that's the challenge. Coming up with a language that's powerful and expressive is easy. Doing that in a way that really serves the industry's need is much harder. Rich Hickey is a pragmatist, and he gets that. I think that the emphasis on transducers was a stumble, because he may have lost sight of the real goal.

Dive in! It's like beginning ballet over twenty: you'd really have to trust your slow, consistent, building momentum to eventually get you there. Learning Clojure up to getting remotely competent with it did take me a year and a half, during which I used it in all my projects. Also including getting into Emacs. I breathed it all throughout. Now I'm good in it, I have ease designing big FP systems, and I finally decided to learn transducers a few weeks ago and it took me me just an hour to grasp/implement it in my app. Neither a decade :)
Its more like 18 months. Unless you already know a lisp or Haskell. The only thing I can say is that it is totally worth it, I know very few people who regret putting it in their head. The same is less true for Ruby, php, Visual Basic.
Are transducers necessary in Clojure because functions of function of functions (etc...) on immutable data structures execute slowly?
They help relieve pressure on the garbage collector by eliminating intermediate sequence allocations (eg. (comp (map a) (map b) (filter c) (map d)) would create 4 sequences, transducers create none). Consequently you can work with results incrementally unless there is an intermediate flush required such as for a windowed aggregation.

I imagine most applications aren't very sensitive to the performance gain, but it's good to know for when you need it. In addition it's a nice and testable, compos-able pattern. I've only done backend work, no cljs yet, but I'd call it reasonable practice to think in terms of compositions/transducers for most data transformation pipelines in the future though, throughout the stack, whether performance matters or not.