No, recursion still works well without TCO (though as a Schemer, I love TCO). I was programming in BCPL in the early 1970s, and it handled recursive procedures with aplomb. The big revolution was realizing that, if you don't allow access to automatic variables declared in outer scopes, you could store all the variables in the stack frame, and access them with a small offset from the stack or frame pointer. That made automatic variables just about as fast as static ones (which, on System/360, had to be accessed via a base register), with small overheads at call and return sites.
Again on System/360, I benchmarked BCPL procedure call costs against subroutine call costs in Fortran G (the non-optimizing compiler). BCPL was about 3 times faster.
That said, as soon as you added multi-tasking (what we'd now call threads), it all went to hell. It's not an accident that one IBM PL/I manual of the 1960s said “Do not use procedures, they are expensive.”
As mentioned by others, it was the tiny stack in the 6502 that killed this approach. I appreciate all those who pine for the 6502, but it made implementing modern (even for the 1970s) languages almost impossible.
Adaptive quadrature is an algorithm for numerical integration. You have a function of one variable, and two endpoints of a range, and an error limit. You want to return the value of the integral of that function over that range, a value that is no further from the correct value than the error limit.
What you do is, you do a three-point approximation and a five-point approximation. The difference between the two gives you a fairly good estimate of the error. If the difference is too high, you cut the region in half, and recursively call the same function on each half.
That calling twice is what makes it hard for a while loop. I mean, yes, you could do it with a work queue of intervals or something, but it would be much less straightforward than a recursive call.
Hey, this is awesome! I'd never heard of adaptive quadrature before. Thanks for the tip! Here's what I hacked together from your description and looking up the Newton–Cotes formulas to get a reasonable order of convergence:
local function q(f, start, fstart, mid, fmid, stop, fstop, ε)
-- Scaled approximation of integral on 3 points using Simpson’s
-- rule.
-- <https://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas#Closed_Newton%E2%80%93Cotes_formulas>
local Q3 = (1/6)*(fstart + 4*fmid + fstop)
-- Scaled approximation of integral on 5 points.
local mid1, mid2 = (1/2)*(start + mid), (1/2)*(mid + stop)
local fmid1, fmid2 = f(mid1), f(mid2)
-- Boole’s rule, giving exact results for up to cubic polynomials
-- and an O(h⁷) error. This means that every additional level of
-- recursion, adding 1 bit of precision to the independent
-- variable, gives us 7 more bits of precision for smooth
-- functions! So we need like 256 evaluations to get results to
-- 53-bit machine precision?
local Q5 = (7/90)*(fstart+fstop) + (32/90)*(fmid1+fmid2) + (12/90)*fmid
--print(mid, Q3, Q5)
-- Approximate the error by comparing the two.
if (stop - start)*math.abs(Q5 - Q3) <= ε then return (stop - start) * Q5 end
-- Recursively subdivide the interval.
return
q(f, start, fstart, mid1, fmid1, mid, fmid, ε) +
q(f, mid, fmid, mid2, fmid2, stop, fstop, ε)
end
function adaquad(f, start, stop, ε)
local mid = (1/2)*(start + stop)
-- In some quick testing, ε/2 seems to work okay.
return q(f, start, f(start), mid, f(mid), stop, f(stop), (1/2)*ε)
end
return adaquad
Again on System/360, I benchmarked BCPL procedure call costs against subroutine call costs in Fortran G (the non-optimizing compiler). BCPL was about 3 times faster.
That said, as soon as you added multi-tasking (what we'd now call threads), it all went to hell. It's not an accident that one IBM PL/I manual of the 1960s said “Do not use procedures, they are expensive.”
As mentioned by others, it was the tiny stack in the 6502 that killed this approach. I appreciate all those who pine for the 6502, but it made implementing modern (even for the 1970s) languages almost impossible.