Hacker News new | ask | show | jobs
by closed 3871 days ago
A key component to transducers is to realize that map, filter, take, can all be expressed as special cases of the reduce function.

So a curried map function can be expressed as a reduction operation, for example, in python you could do..

  # map that adds 1 to each element
  # same as curried map(sub1)([1,2,3])
  sub1 = lambda x: x - 1
  map(sub1, [1,2,3])
  
  # same outcome using reduce
  reduce(lambda acc, x: acc + [sub1(x)], [1,2,3], [])
Notice that when writing a map operation using the reduce function, we've put two pieces of logic on each step. (1) we have a transformation called sub1, and (2) we have a step operation (returning an array with the transformed x appended to it).

We can use this same logic to create filters with an if statement..

  lt2 = lambda x: x < 2
  filter(lt2, [1,2,3])
  reduce(lambda acc, x: acc + [x] if lt2(x) else acc, [1,2,3], [])
A critical thing to notice in each of these cases is that if we ran a curried filter and map (with transformation / filtering functions already passed), then they would loop over the data twice. However, if you look at the example on this github repo, the want to

* take 3 elements from the sequence

* perform a transformation (map operation) to those elements

Notice that in this case take is being performed before the map, but you could imagine doing something like

* map items from geometric series to float

* filter items that are greater than .15

* take 3 of the remaining items

How would you do that with map and filter functions without looping over the entire (inexhaustible) sequence?

This is the beauty of transformers and transducers, they provide a composable approach to filtering, transforming, and taking elements from a (possibly infinite) sequence.

Of course, all of this could be done in a for loop, but often it introduces complexity..

  # assume geom_series is the generator from the github example
  take = 3
  for el in geom_series:
      if float(el) < .15:
          result.append(el)
          if len(result) == take: break
However, this for loop is using the same ideas: we have a transformation / if (predicate) operation, and step operation to append the data we want. The problem transducers try to solve is writing a series of these types of operations compactly and cleary as a pipeline of functions (erm, transformers), so those functions can be replaced/moved around quickly, and the pipelines can be combined together simply.

A good article that covers tranducers in javascript (sorry!) is here:

http://simplectic.com/blog/2014/transducers-explained-1/

1 comments

> How would you do that with map and filter functions without looping over the entire (inexhaustible) sequence?

But isn't this just a matter of defining map and filter over a lazy stream datatype (aka iterators) instead of over lists?

So the workflow would be

   list -> stream -> filter1 -> filter2 -> list
This would let you run the filters "in parallel" without iterating through things twice.
Python's map and filter already run over iterators. If you're talking about creating filters that are composable, so that you could do

  filter1 = filter(predicate1)   # curried filter
  filter2 = filter(predicate2)   # curried filter
  pipeline = compose(filter1, filter2)
  pipeline(<generator of some kind>)
and make it so that the sequence of operations will be

  for el in some_generator:
    if predicate1(el) and predicate2(el): <accumulate value>
Then I would say defining them in this way is useful and important--transducers are exactly one way of doing this! A critical aspect I didn't go into detail on is the idea of a take function. In the github repo, T.take(3) is the portion that allows the transducer to operate of an infinite stream of values.

This is the piece your workflow example would need to take into account. How could I apply a filter followed by something that takes 3 passing values from an infinite sequence? I'm sure you could come up with a way, and it would be worth comparing to the transducer approach :).

(it's worth noting that currying map, filter, etc.. are very complimentary to the transducer approach)

If you are wondering how to make a lazy take() in python, here is one solution:

    def take(num):
        def gen(iterable):
            for i, item in enumerate(iterable):
                if i == num:
                    break
                yield item
        return gen
and then

    list(take(3)(range(100)) == [0, 1, 2]