| There are two main parts to this: - GHC can do less unpacking (moving heap allocations to registers)because of laziness - laziness as an implementation detail is slow To implement laziness, GHC puts a closure with it's environment on the heap. When the value is used for the first time, the closure is called. Then the result is put on the heap, the closure is replaced with an indirection to the result, and the program continues. But this also has some advantages: - For values that might be used zeor or multiple times, this often actually saves time on average over both strict or call-by-name evaluation - This is faster than any equivalent code you can write by hand because compiler and runtime system heavily optimize around it - This gives you mutation through the backdoor which allows asymptotic improvements over strict pure languages in some cases But the main thing is that laziness allows runtime dead code elimination - if you have an unpacked vector of pairs and only use the left element of each pair, only those values will be allocated - still in a dense bytearray. |