Hacker News new | ask | show | jobs
by narski 695 days ago
I recently ran into this issue when trying to memoize a simple numerical sequence in Hoon (yes, that Hoon. I know, I know...).

Let's use the fibonacci sequence as an example. Let's write it the classic, elegant way: f(n) = f(n-1) + f(n-2). Gorgeous. It's the sum of the two previous. With the caveat that f(n=0|1) = n. In Python:

  # fib for basic b's
  def fib(n):
    ## Base case
    if n == 0 or n == 1:
      return n
    
    return fib(n-1) + fib(n-2)
Right off the bat, performance is O(n)=n*2. Every call to f(n-1) will also need to compute f(n-2) anyways! It's a mess. But since Python passes arrays and dictionaries as pointers (cough, sorry! I meant to say references) it's super easy to memoize:

  # optimize-pilled memoize-chad version
  def fib(n, saved={}):
    if n in saved:
      return saved[n]
    
    if n == 0 or n == 1:
      saved[n] = n
    else:
      saved[n] = fib(n-1) + fib(n-2)
    
    return saved[n]
Okay, now our version is nearly as fast as the iterative approach.

This is the normal pattern in most languages, memoizing otherwise "pure" functions is easy because you can reference a shared object using references, right? Even with multithreading, we're fine, since we have shared memory.

Okay, but in Hoon, there are no pointers! Well, there kinda are. The operating system lets you update the "subject" of your Urbit (the context in which your programs run), and you can do this via the filesystem (Clay) or daemons (Gall agents, which have their own state kind of).

But to do this within a simple function, not relying on fancy OS features? It's totally possible, but a huge pain the Aslan.

First, here's our bog-standard fib in Hoon:

  |=  n=@ud
  ?:  (lte n 1)
    n
  %+  add
    $(n (dec n))
  $(n (sub n 2))
Now, I memoize on the way down, by calculating just f(n-1) and memoizing those values, to acquire f(n-2):

  :-  %say
  |=  [* [n=@ud ~] [cache=(map @ud @ud) ~]]
  :-  %noun
  ^-  [sum=@ud cache=(map @ud @ud)]
  =/  has-n  (~(get by cache) n)
  ?~  has-n
    ?:  (lte n 1)
      [n (~(put by cache) n n)]
    =/  minus-1  $(n (dec n))
    =/  minus-2 
      =/  search  (~(get by cache.minus-1) (sub n 2))
      ?~  search  0
      (need search)
    :-  (add sum.minus-1 minus-2)
    (~(put by cache.minus-1) n (add sum.minus-1 minus-2))
  [(need has-n) cache]
and that works in the Dojo:

  > =fib-8 +fib 8
  > sum.fib-8
  21
but it sure is easier in Python! And I'm not picking on Hoon here, it's just pure functional programming that makes you think this way - which as a hacker is fun, but in practice is kinda inconvenient.

I even wonder how much faster I actually made things. Let's see:

  > =old now
  > =res +fib 18
  > sum.res
  2.584
  > (sub now old)
  1.688.849.860.263.936
  :: now with the non-memoized code...
  > =before now
  > +fib 18
  2.584
  > (sub now before)
  1.125.899.906.842.624
Ha! My super improved memoized code is actually slower! That's because computing the copies of the map costs more than just recurring a bunch. This math should change if I try to compute a bigger fib number...

Wait. Nevermind. My memoized version is faster. I tested it with the Unix time command. It's just that Urbit Dojo has a wierd way of handling time that doesn't match my intuition. Oh well, I guess I can learn how that works. But my point is, thinking is hard, and in Python or JS or C I only have to think in terms of values and pointers. And yes, that comes with subtle bugs where you think you have a value but you really have a pointer! But most of the time it's pretty easy.

Btw sorry for rambling on with this trivial nonsense - I'm a devops guy so this is probably super boring and basic for all you master hn swe's. But it's just a tiny example of the constant frustrations I've had trying to do things that would be super simple if I could just grab a reference and modify something in memory, which for better or worse, is how every imperative language implicitly does things.

4 comments

The standard way to expel stated mutable state us to push it into function parameters and returned values.

With Fibonacci numbers, you cab just compute two if them outright:

  def fib_(n: int) -> tuple[int, int]:
    if n == 0:
      return (1, 1)
    prev, this = fib_(n - 1)
    return (this, this + prev)

  def fib(n): return fib_(n)[0]
Now there is no mutable state that survives between function calls, the performance is linear.

With true memoization though accessing a previously computed value.would be constant time.

>The standard way to expel stated mutable state us to push it into function parameters and returned values.

This is precisely what I did in my Hoon solution :) However, I wasn't aware that this approach is the standard way, and I'm glad to have learned that! Thanks

Hoon has magic memoization, though (~+)

    ++  fib
      |=  a=@
      ^-  @
      ~+
      ?:  (lte a 1)  a
      %+  add
        $(a (sub a 2))
      $(a (sub a 1))
Try e.g. (fib 100) (don't try it without the ~+)

The compiled code is itself memoizable at the VM execution level. This memo cache is transient within one system event (i.e., pressing enter after (fib 100) to get the result).

Ha! I read the docs for ~+ (https://docs.urbit.org/language/hoon/reference/rune/sig#-sig...) and fibonacci sequence is actually the example they use!

Thanks so much for this enlightening comment :3

Although I'm curious why Hoon doesn't just detect and cache identical computations by default. I guess it's a tradeoff, since using ~+ is more memory intensive, and you don't always want that either. Especially is Urbit itself is already fairly memory intensive.

There's a lot of reductions that happen during execution, and not many are usefully going to be repeated identically; you'd end up with an extremely large and nearly entirely useless cache if you tried to cache everything. So you hint the VM when it turns out that something does repeat.

What has to repeat is an identical subject (which will have a shape like [argument-to-fib other-local-context standard-library]) and formula (the compiled code for fib). Pretty much the only time this will ever happen is something recursing into itself. Most tail recursion doesn't reevaluate the exact same arguments multiple times. It just so happens that naive fib's exponential self-recursion into itself twice does do that.

So it wouldn't be useful to stick a ~+ on, say, factorial. Except! If you're, say, computing every factorial from 1 to 100 in a loop, it comes in handy - because now you can reuse your computation of, say, 50! when computing 51!, so it's just one multiplication.

But most Nock reductions by volume are not anything that repeats usefully.

Note to self: never code Hoon
Urbit fixes this