Isn't Haskell internally using a variant of the SKI combinator calculus to represent that all functions are data? This would mean less reduction for an already reduced program.
That would be true for a toy Haskell compiler. “The standard” Haskell compiler, GHC, can do a whole lot of optimisations and produces really good machine code nowadays.
Other people have provided good answers, but I'm just gonna add another link explaining Y Combinator ``Why of Y" by Matthias Felleison[1] , which is so far my favorite explanation.
The Y combinator (which is behind the name for this website) lets you define recursive functions without actually using recursion (the only recursive definition is Y itself, which is often available as a language primitive)
A less obfuscated version of the factorial using the y combinator would be this one:
fac' f n = if n == 0 then 1 else f (n - 1)
fac = y fac'
note how fac' is not recursive and instead calls its "f" parameter when it wants to do a recursive call. The y combinator then ties everything together by passing fac' to itself.
The other obfuscating factor in that program are the S,K,B and C combinators. The purpose of these one is to let you write a program without any variable definitions or lambdas. Notice that when that program finally defines `fact`, it has the form
fact = y {some big expression}
where the big expression only has function applications and references to the previously-defined combinators. There are no mentions to the "f" and "n" variables that we had before. The neat thing about the SK family of combinators is that you can use them to build any function, not just the factorial:
As for the speed comment, one way to implement lazy evaluation, as found in Haskell, is to compile programs down to those SK combinators and then use graph reduction to evaluate the result. I would expect a reasonable Haskell implementation to be able to automatically generate the unreadable combinator version of the program from the usual definition of factorial that we all know and love but maybe there was a Haskell implementation back in 2001 that wasn't as reasonable?
To be clear, Y itself doesn't have to be recursive either. You can express it as an ordinary lambda, and write a recursive program without any named values or functions.
The problem with Haskell is that you can't express Y as an ordinary lambda in a typed language unless your type system allows recursive datatypes (at which point you are also pushing the recursion back into a language primitive)
[1] https://en.wikipedia.org/wiki/SKI_combinator_calculus [2] https://en.wikipedia.org/wiki/B,_C,_K,_W_system