Hacker News new | ask | show | jobs
by fexl 5515 days ago
> The lousy performance doesn't matter, most of the time, because you have other data-structures for data, and your compiler generally is only interested in sequential access ...

Exactly. Much of the time you don't care. The rest of the time, you can still use assoc-lists, except you create a branching structure on the keys such that no key in the list is a prefix of any other. The first entry of a node contains the value "at" that node, and the tail of the node is a list of branches, where each branch is a pair consisting of a string key and a child node.

The "get" and "put" functions in this structure are very simple recursive functions, and you can maintain very large dictionaries ("maps") this way. The get and put operations are essentially O(1), constant time, if you ignore the sheer length of the keys (which are usually reasonably short anyway). I've done tests where I insert hundreds of thousands of pseudo-random keys and values, and read them back, all very quickly and reliably. The code splits the keys into branching structures automatically. It's also a purely functional structure, so you can keep old versions of the maps.

The value at each node can be anything you like, not just a string, though to detect the absence of a value you need to use a "Maybe" type. In Fexl you can define constructors:

  \absent  = (       \absent\present absent)
  \present = (\value \absent\present present value)
So then the default value of a newly created branch is always absent, but a "put" operation replaces it with (present value). When you delete a key, it automatically detects if the node is empty (i.e. its value is absent and it has no branches) and then eliminates the node from its parent.

You also detect singleton nodes, which have an absent value and only one branch. In that case you can concatenate the key of that branch with the parent key upstairs, eliminating the extra branch.