Hacker News new | ask | show | jobs
Reduce vs. Fold in Common Lisp (n16f.net)
92 points by billiob 1113 days ago
8 comments

The history here is that Common Lisp gets reduce from APL (https://aplwiki.com/wiki/Reduce). It's not an attempt an ML-style fold, but a different formalism from a different lineage.
While reading this, I was immediately reminded of the reduce operator, glad to see my intuition wasn't far off.

The nifty thing about this operator in the array-langs compared to the usual fold function is that they usually define identity elements for all primitive functions, which means that no initial value has to be provided: https://aplwiki.com/wiki/Identity_element

The downside of this approach though, is that using reduce with non-primitive functions can result in domain errors (at least in APL). I think BQN's version of the operator is a bit nicer, in that it allows you to specify an initial value in this situation: https://mlochbaum.github.io/BQN/doc/fold.html

> they usually define identity elements for all primitive functions > using reduce with non-primitive functions can result in domain errors

ml-style folds in the presence of ad-hoc polymorphism solve this rather handily -- in haskell for instance monoid is the typeclass that only requires an associative operation and an identity element

typeclasses have some clunkiness in this regard; you have to wrap numeric types as "sum" or "product" etc to go "ah yes today i want to say numbers are a monoid under this operation" but at the very least it does enable formal, user-defined associations between identity elements and functions

luckily most things programmers deal with are plausibly just one kind of monoid. for instance the eleventy billion different string types haskell programmers love to use all tend to satisfy monoid under concatenation without any wrappers

> downside of this approach though, is that using reduce with non-primitive functions can result in domain errors

Yes, that's another problem. There is precedent for associating metadata with user-defined functions (eg inverses); identities seem to have fallen by the wayside, but I am planning to fix that for j.

Defining a number of related functions seems to be a pattern that comes up elsewhere. For example, consider functions that compute a hash value, canonicalize, and compute some notion of equality. It would be useful to associate all of these.
Haskell calls these 'typeclasses'; cl calls them 'protocols'. Apl style is not to expose sophisticated user-level abstractions, so I think that there it is not inappropriate that the scope of associable objects (say, a monad, a dyad, an inverse, and an identity; perhaps a few others) be fixed by the language.
I had thought APL style was to specify initial values simply by pre-catenating the desired initial value onto the argument?
No, as you can check with some of the weirder arithmetic functions:

        </ ⍬     ⍝ Empty list
  0
        </ ,5
  5
        </ 0 5
  1
        </ 5 0
  0
It would be more consistent in some ways though (for example forcing </ to always have a boolean result). APL designer Bob Smith's advocated for it, particularly for ,/ to make it behave better as a joining function: http://www.sudleyplace.com/APL/Reduction%20Of%20Singletons.p...
that doesn't work when the desired initial value is not the same shape as the major cells of the argument array
I never heard of apl's allowing for an explicit choice of initial value, nor direction control ('from-end'), which mls typically also provide (foldl vs foldr). On the other hand, the ad-hoc choice of result for an empty input is a flaw common to cl and apl.
…REDUCE functions can be called with zero, one or two list values.

No, it's either two or zero arguments.

Author here. I double checked the Hyperspec [1] and you're quite right. I'll fix the article tomorrow, thank you for noticing!

[1] http://www.lispworks.com/documentation/lw60/CLHS/Body/f_redu...

fold is a natural consequence of church lists, probably the most fundamental way of expressing the list abstraction:

\f \x (f a0 (f a1 (f a2 x)))

So fold is just applying a list (function) to 2 arguments. Or you can be helpful and make something like fold := \f \x \l l f x which is useful for binding the f and the x and applying to multiple lists (everything is Curried of course)

LISP is not quite based on lambda calculus, so it should be no surprise it doesn't quite get reduce(i.e. fold) right.

See also church numerals, which are like lists but without the elements, they also have a 'fold':

\f \x f (f (f x))) == 3

We can make trees! Which again also have a 'fold'

\f \x f a0 (x a1) (f a2 (x a3) (x a4))

And many other more exotic folding data structures.

> It then applies the function to successive pairs of sequence elements.

No. #'reduce may take the first pair as an optimization step, but from that point on it processes sequence elements one at a time. It passes an accumulated value and the next sequence value to the function.

>Fold is also simpler than REDUCE since it does not have any special case, making it easier to reason about its behaviour.

If returning the initial value when the list is empty is considered a special case (or "surprising aspect") of REDUCE, then it's the same for FOLD, no?

The initial value is optional for `reduce`, but required for `fold`. If you don't pass the initial value to `reduce`, and the sequence argument is empty, then `reduce` calls the function with no arguments, which is unique - in all other cases, it is called with two arguments. `fold` always calls its function argument with two arguments.

This shows up more clearly in statically-typed functional languages, where variadic functions like this are far less common. In that case, you typically see that `reduce` returns an option type, whereas `fold` does not. The types would look something like `fold :: (a -> b -> a) -> a -> List b -> a` vs `reduce :: (a -> a -> a) -> List a -> Option a`.

The most surprising aspect of REDUCE is the way the callback can be called depending on both the length of the list and the presence of absence of an initial value.
I don't have a running image handy, but

  (+) ; => 0
  (*) ; => 1
and

  (+ n) ; => n
  (* n) ; => n
which I expect has some bearing on the behavior of reduce in the examples given.

It's pretty obvious that any other function could either have or be advised to have whatever equivalent semantics are appropriate.

Of course

  (apply #'+ '(1 2 3 4 5)) ; => 15
So reduce can be obviated by just letting the function take variable args too.
> So reduce can be obviated by just letting the function take variable args too.

In Common Lisp the max number of arguments can be small.

  $ abcl
  Armed Bear Common Lisp 1.8.0
  Java 11.0.19 Ubuntu
  OpenJDK 64-Bit Server VM
  Low-level initialization completed in 0.304 seconds.
  Startup completed in 1.501 seconds.
  Type ":help" for a list of available commands.

  CL-USER(1): CALL-ARGUMENTS-LIMIT
  50
Interesting. Do you know if that is due to constraints coming from ABCL running on a JVM, or is it an arbitrary choice ? (By contrast, SBCL on a x86_64 laptop returns 4611686018427387903...)
Usually these limits are based on the capabilities of the underlying platform (like the JVM) or the implementation choice. One of the goals is to have fast function calls and less so to have long arguments lists. Less than 2^16 isn't that rare. SBCL is more on the large side. Implementations like ABCL, CLISP and some others have much smaller max arglist length limits.

Don't use

  (apply #'+ long-list-of-numbers)
but use

  (reduce #'+ long-list-of-numbers)
So this wasn't what I was expecting.

  * (disassemble (lambda () (apply #'+ '(1 2 3 4 5))))
  ; disassembly for (LAMBDA ())
  ; Size: 21 bytes. Origin: #x5345C11B                          ; (LAMBDA ())
  ; 1B:       498B4510         MOV RAX, [R13+16]                ; thread.binding-stack-pointer
  ; 1F:       488945F8         MOV [RBP-8], RAX
  ; 23:       BA1E000000       MOV EDX, 30
  ; 28:       488BE5           MOV RSP, RBP
  ; 2B:       F8               CLC
  ; 2C:       5D               POP RBP
  ; 2D:       C3               RET
  ; 2E:       CC10             INT3 16                          ; Invalid argument count trap
  NIL
  * (disassemble (lambda () (reduce #'+ '(1 2 3 4 5))))
  ; disassembly for (LAMBDA ())
  ; Size: 21 bytes. Origin: #x5345C1AB                          ; (LAMBDA ())
  ; AB:       498B4510         MOV RAX, [R13+16]                ; thread.binding-stack-pointer
  ; AF:       488945F8         MOV [RBP-8], RAX
  ; B3:       BA1E000000       MOV EDX, 30
  ; B8:       488BE5           MOV RSP, RBP
  ; BB:       F8               CLC
  ; BC:       5D               POP RBP
  ; BD:       C3               RET
  ; BE:       CC10             INT3 16                          ; Invalid argument count trap
  NIL
There is some SBCL compiler optimizer at work.
so foldl is basically

  (defun foldl (function value sequence)
    (reduce function sequence :initial-value value))
Yeah, I'm not seeing what's so special either. Maybe it's that you do have to specify that initial value, so your return types are never something you don't expect?
The numeric tower in Common Lisp may surprise us with simple things as addition:

  (+ #c(10 -1) #c(20 1))
the result of adding these two complex numbers is the integer 30.
Huh. I did not know that. I am surprised that it actually coerces the result to an integer, although that's certainly an artificial type edge case.

    SBCL is free software, provided as is, with absolutely no warranty.
    It is mostly in the public domain; some portions are provided under
    BSD-style licenses.  See the CREDITS and COPYING files in the
    distribution for more information.
    * (+ #c(10 -1) #c(20 1))
    30
    * (type-of (+ #c(10 -1) #c(20 1)))
    (INTEGER 0 4611686018427387903)
For example, if you do this, you do get a complex number.

    * (type-of (+ #c(10 -1) #c(20 1.0))) 
    (COMPLEX (SINGLE-FLOAT 0.0 30.0))
+ is not associative for many types.
True, which is why Common Lisp accepts a parameter to specify which order the + should be applied - from left to right or vice versa. Left to right is default.