Hacker News new | ask | show | jobs
by GnarfGnarf 1453 days ago
Can someone clarify something for me? If you build a linked list or a dynamically growing array in a Functional program, am I correct in understanding that the array is never modified, instead a copy is made and the new element is added to the copy of the array?
2 comments

Immutable, fixed size arrays are pretty easy to handle in a functional style, of course, but most functional languages only discourage mutability while not outright forbidding it, so you'll usually have access to mutable dynamically growing arrays that work the exact same way as you expect them to.

Because functional languages do discourage mutation, they tend to fallback on an implementation of linked lists where the lists are never modified (You can google for Persistent Data Structures for more). Because the list never changes, you can have two lists share a tail, by building on two different heads atop that tail (like a git branch of sorts, minus the merging).

If you want to build a pure language, you can build immutable dynamic-sizeable arrays with a copy-on-write setup. Growing the array keeps building on top of the existing buffer, if you outgrow it or need to actually mutate it, you make a copy.

If you want both purity and actual full-on mutability, e.g. Haskell has Data.Array.MArray, which locks the array behind a mutable context (like IO), so your pure code never sees any mutation.

The result should be that after the addition you should not have affected the old array yes. The most basic way of implementing this is what you described. However, there are a number of different ways to optimize this. Since you know that the elements are immutable, if you add the new element to the start of the list you could just point it to the old elements and you now have two lists which share the majority of their elements. If you are interesting to learn about this more I would recommend looking into how for example Clojure implement their "persistent data structures". Most functional languages have similar things so you could find it elsewhere as well if you want.
So... what happens when my array grows to 2GB, and I have millions of copies because I added millions of entries?
1. You switch to a pragmatic language that gives you an escape hatch to mutate data.

2. You use a persistent data structure that lets it grow without needing to make millions of copies for millions of new entries.

3. You use a different constructor pattern to avoid the allocations.

For (3), a common way around this (with lists) would be to build it in reverse and then reverse it (assuming that the order actually mattered at all, if it doesn't you don't need to reverse it at the end). This is done in Erlang, for instance, as a common pattern:

  make_something_n_times(0, Acc) -> reverse(acc);
  make_something_n_times(N, Acc) -> make_something_n_times(N-1, [f(N) | Acc]).
You end up with two copies of the list, one in the constructed order and the reverse ordered version, but the constructed order one will be garbage collected in short order. Two copies, better than millions of copies.

You also see a pragmatic solution in Erlang with iolists. These are lists that contain items that you'd want to send to, well, IO functions. But instead of forcing you to allocate whole new strings for concatenation (a common thing to do with strings) like this:

  S1 ++ S2 %% results in allocating a new string and copying contents from at least S1
You can do this:

  Concatenated = [S1,S2]
Now it's a two-element list that references the two prior ones, you've allocated some new memory, but just enough for a new list. Now you have a third string you want to prepend?

  [S3,Concatenated]
Again, minimal amount of allocation and copying (there is no copying). You can use this pattern in other situations and only flatten when needed, or "flatten" it by recursing over the structure to access all the elements but never actually constructing a flattened version.
OK, thanks for the detailed answers. I am trying hard to learn what I can about FP. I've only been doing this programming thing since 1965.

Is there an FP equivalent of the C++ STL (Standard Template Library)?

So when you ask about the STL I can interpret that in several ways:

1. Good sized standard library for many common tasks, yes. You won't have to reinvent the wheel using functional programming languages. Though the size and scope of their standard libraries will vary. And when it's not in their standard library, code reuse is a thing and most have decent to good package managers for obtaining what amounts to community-standard solutions to problems not covered by the language standard.

2. Generic data structures. Definitely yes. Either by virtue of being dynamically typed (Erlang, the Lisp family) or because parametric polymorphism (roughly analogous to C++'s generics) has been a standard thing for 4 or 5 decades for the statically typed ones (ML family, Haskell).

3. Generic algorithms. See (2).