Hacker News new | ask | show | jobs
by fnordsensei 3871 days ago
The idea is that you can define a transformation separate from its use and then reuse it in many contexts. Some of the win sits in decoupling what you are doing from when and where you are doing it.

Rich Hickey wrote about them on Cognitect's blog a while back: http://blog.cognitect.com/blog/2014/8/6/transducers-are-comi...

The author refers to the test suite for examples: https://github.com/cognitect-labs/transducers-python/blob/ma...

1 comments

I always had a hard time figuring out how transducers are different "stream transformation" functions like map and filter.

I know that they are more complicated/powerful than that but I can't quite point that out precisely.

You can transform map or filter into a transducer by decoupling the transformation from the data. When you call map, you usually call it on some data, or some stream. When you create a transducer, and you call for example (map inc), you're saying "whenever this transducer is applied, on whatever it is applied to, increase each unit by one".

You can lug around the transformation itself as if it were an object, almost, separate from the data it's meant to transform.

How is that different from just partially applying/currying the map?
A key component to transducers is to realize that map, filter, take, can all be expressed as special cases of the reduce function.

So a curried map function can be expressed as a reduction operation, for example, in python you could do..

  # map that adds 1 to each element
  # same as curried map(sub1)([1,2,3])
  sub1 = lambda x: x - 1
  map(sub1, [1,2,3])
  
  # same outcome using reduce
  reduce(lambda acc, x: acc + [sub1(x)], [1,2,3], [])
Notice that when writing a map operation using the reduce function, we've put two pieces of logic on each step. (1) we have a transformation called sub1, and (2) we have a step operation (returning an array with the transformed x appended to it).

We can use this same logic to create filters with an if statement..

  lt2 = lambda x: x < 2
  filter(lt2, [1,2,3])
  reduce(lambda acc, x: acc + [x] if lt2(x) else acc, [1,2,3], [])
A critical thing to notice in each of these cases is that if we ran a curried filter and map (with transformation / filtering functions already passed), then they would loop over the data twice. However, if you look at the example on this github repo, the want to

* take 3 elements from the sequence

* perform a transformation (map operation) to those elements

Notice that in this case take is being performed before the map, but you could imagine doing something like

* map items from geometric series to float

* filter items that are greater than .15

* take 3 of the remaining items

How would you do that with map and filter functions without looping over the entire (inexhaustible) sequence?

This is the beauty of transformers and transducers, they provide a composable approach to filtering, transforming, and taking elements from a (possibly infinite) sequence.

Of course, all of this could be done in a for loop, but often it introduces complexity..

  # assume geom_series is the generator from the github example
  take = 3
  for el in geom_series:
      if float(el) < .15:
          result.append(el)
          if len(result) == take: break
However, this for loop is using the same ideas: we have a transformation / if (predicate) operation, and step operation to append the data we want. The problem transducers try to solve is writing a series of these types of operations compactly and cleary as a pipeline of functions (erm, transformers), so those functions can be replaced/moved around quickly, and the pipelines can be combined together simply.

A good article that covers tranducers in javascript (sorry!) is here:

http://simplectic.com/blog/2014/transducers-explained-1/

> How would you do that with map and filter functions without looping over the entire (inexhaustible) sequence?

But isn't this just a matter of defining map and filter over a lazy stream datatype (aka iterators) instead of over lists?

So the workflow would be

   list -> stream -> filter1 -> filter2 -> list
This would let you run the filters "in parallel" without iterating through things twice.
Python's map and filter already run over iterators. If you're talking about creating filters that are composable, so that you could do

  filter1 = filter(predicate1)   # curried filter
  filter2 = filter(predicate2)   # curried filter
  pipeline = compose(filter1, filter2)
  pipeline(<generator of some kind>)
and make it so that the sequence of operations will be

  for el in some_generator:
    if predicate1(el) and predicate2(el): <accumulate value>
Then I would say defining them in this way is useful and important--transducers are exactly one way of doing this! A critical aspect I didn't go into detail on is the idea of a take function. In the github repo, T.take(3) is the portion that allows the transducer to operate of an infinite stream of values.

This is the piece your workflow example would need to take into account. How could I apply a filter followed by something that takes 3 passing values from an infinite sequence? I'm sure you could come up with a way, and it would be worth comparing to the transducer approach :).

(it's worth noting that currying map, filter, etc.. are very complimentary to the transducer approach)