A list structure is already a 'dictionary tree', except it has lousy performance in access. (only really visible on large data sets).
* (second (assoc 5 '((1 2) (3 4) (5 6))))
> 6
The lousy performance doesn't matter, most of the time, because you have other data-structures for data, and your compiler generally is only interested in sequential access (constructing a parse tree out of it).
The lisp code already being a tree makes parsing fairly simple...
So, I guess my point is that I don't see the point of this?
The only reason to really upgrade to a dictionary would be if you wanted to have non-sequential access to the contents of of your large-scale code-data-structure... generally the lists aren't large enough to merit it. You can do all of the dictionary operations on a list, and they are 'fast enough'.
----
I feel like the rationale here is created by conflating the idea of a cons and a list. I think generally, a list already accomplishes what the author sets out to accomplish.
They are a single step higher level than conses, and give you a hierarchical structure with the possibility for named associations. I don't really see how using something more complicated and memory intensive to build this basic data structure qualifies as an improvement...
> The lousy performance doesn't matter, most of the time, because you have other data-structures for data, and your compiler generally is only interested in sequential access ...
Exactly. Much of the time you don't care. The rest of the time, you can still use assoc-lists, except you create a branching structure on the keys such that no key in the list is a prefix of any other. The first entry of a node contains the value "at" that node, and the tail of the node is a list of branches, where each branch is a pair consisting of a string key and a child node.
The "get" and "put" functions in this structure are very simple recursive functions, and you can maintain very large dictionaries ("maps") this way. The get and put operations are essentially O(1), constant time, if you ignore the sheer length of the keys (which are usually reasonably short anyway). I've done tests where I insert hundreds of thousands of pseudo-random keys and values, and read them back, all very quickly and reliably. The code splits the keys into branching structures automatically. It's also a purely functional structure, so you can keep old versions of the maps.
The value at each node can be anything you like, not just a string, though to detect the absence of a value you need to use a "Maybe" type. In Fexl you can define constructors:
So then the default value of a newly created branch is always absent, but a "put" operation replaces it with (present value). When you delete a key, it automatically detects if the node is empty (i.e. its value is absent and it has no branches) and then eliminates the node from its parent.
You also detect singleton nodes, which have an absent value and only one branch. In that case you can concatenate the key of that branch with the parent key upstairs, eliminating the extra branch.
not directly lisp related, but the team working on Fortress has some interesting work on using "conc lists" (pairs, and trees as a combination of pairs) as a basic unit of work for parallel processing[1], the analysis quite naturally considers its relation with lisp ad cons lists in other languages.
I think that's directly related. The answer to the question "Why should we replace our lists?" is that you can avoid known failure cases with lists at a relatively small price in complexity and memory in the worst case, and sometimes get big gains. Singly-linked lists are not suitable for a concurrent world and functional programming really needs to stop putting lists on a pedestal and writing them into their very syntax. There's actually other problems they cause, too, but the concurrent problem is a real killer, because you can not just program your way around it, you need a fundamentally different data structure.
A better reason to replace lists with trees might be for parallel execution. I enjoyed this talk by Guy Steele on replacing lists with trees in Fortress:
http://vimeo.com/6624203
To take a familiar example, the Web is not a tree. While you might argue that each page has unique children (outlinks), there is no unique parent -- a page can have an arbitrarily large number of incoming links, and the page can link back to those pages in turn, producing cyclic structures of immense complexity.
For some interesting discussion on the limits of tree structures with respect to urban planning see A City is Not a Tree by Christopher Alexander (who is also the inspiration for the Design Patterns movement in software engineering). http://www.rudi.net/node/317
The structures described are called trees in the post, but are actually just sets of named objects, which can be used to describe cyclic structures in the same way conses can.
The structure he calls a 'tree' allows cycles and is a generalization of a list. Whether he uses the correct word isn't really interesting: it doesn't detract anything from the post.
It's easy to implement both lists and name-value associations in terms of pairs, so I'm not sure why they need to be built in.
In Fexl (http://fexl.com/code), the only built in concept is the function, and all data is structured out of functions. I won't go into all the details here, but concepts like pair, list, key-value assoc, branch, and everything else are readily expressed as pure functions. So-called "circular" structures are created with the Y combinator, so they really are not circular in memory, only in concept.
The seed was planted back in 1989 when I read a paper by Jørgen Steensgaard-Madsen in Communications of the ACM titled "Typed Representation of Objects by Functions" (CACM, January 1989, Volume 11 Number 1). I found it brilliant and I never could shake the idea after that.
Even something as lowly as a bit (Boolean value true or false) can be represented as a function. In Fexl, the functions for T and F are expressed as:
\T = (\T\F T)
\F = (\T\F F)
In short, the T function returns its first argument, and the F function returns its second argument. So you can do things like this:
eq x 4 (print "Yes, it's 4") (print "No, it's not 4")
You don't even need an "if" function. Or, if you like the way "if" looks, you can define it as the identity function:
\if = (\x x)
Then you can say:
if (eq x 4) (print "Yes, it's 4") (print "No, it's not 4")
You can also define "lists" as functions. There are two cases, "null" and "cons":
Now if you have a list called "groceries", you can do this:
groceries
(print "You don't need anything at the moment")
\head\tail
print "You need "; print head;
print "and possibly some other things."
If you aren't comfortable "calling" a list object as a function, you can define the more familiar "observer" functions:
\empty = (\list list T \_\_ F)
\head = (\list list undef \head\_ head)
\tail = (\list list undef \_\tail tail)
Then you could say:
if (empty groceries)
(print "You don't need anything at the moment")
(print "You need "; print (head groceries);
print " and possibly some other things.")
If you want an ordered pair, here you go:
\pair = (\x\y \pair pair x y)
Now you can say:
\the_pair = (pair 2.6 3.8)
...
the_pair \x\y
print "left is "; print x;
print "right is "; print y;
So, to sum it all up, I am just irresistibly attracted to the utter profundity of this concept. I like how the components of a piece of data simply "present themselves" to the handler function(s) applied to it. Also, you can name your data constructors anything you like -- there are no ultimately predefined names for anything, except any primitive combinators built into the C code, but even these can be shadowed or hidden. So if you want to use "item" and "end" instead of "cons" and "null", you can. (I do.) And if you want to use "first" and "rest" instead of "head" and "tail", you can.
If you like, you can download the code at http://fexl.com/code . One of these days I should put a sandboxed demo interpreter up on the site.
Although I keep hearing “everything is a function” (and 3 is a nullary or constant function), I don’t hear people say “everything is a list”, and 3 is really the singleton list [3].
Well, in musings around Arc, pg did mention defining everything as a list, including 3 as [[] [] []], I believe. So, it's not that no one says it. :)
Although Fexl does have built-in functions for long int and double float values, it is possible to do without them.
There are a couple of ways to define integers as pure functions. One is unary notation, which is equivalent to the "3" example you cite above. So you'd have these two constructors:
Or, using the ';' (right-pivot) to avoid nesting on the right (though sometimes that can be too clever by half):
\add == (\x\y x y \n succ; add n y)
That works fine for some applications. But you can use binary notation as well, defining an integer as a list of bits, with the lowest-order bit first. Then the "add" function becomes:
\add == (\x\y x y \bx\nx y x \by\ny
\sum=(add nx ny)
bx
(by (cons F (inc sum)) (cons T sum))
(cons by sum))
Where inc is:
\inc == (\x x (cons T null) \b\n
b (cons F (inc n)) (cons T n))
I think I transcribed that right, but I haven't tested it in a while. It's all surprisingly fast -- I was actually doing long division of 100 digit numbers years ago in reasonable time in this notation, including the conversion to decimal notation by repeated division by 10.
One of these days I need to link a big number library into Fexl, but it hasn't been a priority for me. (Actually I need to think seriously about dynamically linked libraries, with linkage control done in Fexl itself. This can be sandboxed for secure applications, of course.)
Oh and here's my old division function, which again I haven't tested in a while, so no warranties expressed or implied:
#---------------------------------------------------------------------------
# (nat:div x y) divides x by y. It yields a pair <q,r>, where q is the
# quotient and r is the remainder.
#
# The result satisfies the equation x = q*y + r, 0 <= r < y.
#
# NOTE: If you divide by zero, the function yields the pair <0,0>.
#---------------------------------------------------------------------------
\nat:div==(\x\y\:
x (: null null) \bx\nx
y (: null null) \by\ny
by
(
# divide by odd - recur on x only
nat:div nx y \q\r
\r=(bx nat:2x1 nat:2x r)
\d=(nat:sub r y)
int:ge0 d
(: (nat:2x1 q) (int:abs d))
(: (nat:2x q) r)
)
(
# divide by even - recur on x and y
nat:div nx ny \q\r
: q (bx nat:2x1 nat:2x r)
)
)
Lisp is already made of trees. A Lisp program is a pre-order traversal of the abstract syntax tree of a yacc'ed program in another programming language. This fact is WHY Lisp is powerful. By eliminating yaccing you can create new syntax on the fly without having to recompile your compiler.
By "replacing" lists with trees, am I supposed to understand that working with trees will be natural and easy, and lists will feel like second-class citizens? I thought we were beyond thinking like that. Building languages around a favorite data structure was never the right thing to do; it was always tunnel vision and bad design. Different data structures have different features and different performance characteristics. There's no reason for a language designer to pick a favorite data structure and force anyone who has to use a different one to deal with warts and second-class library and language support.
I wouldn't say lists should be replaced with trees, but adding a rich set of tree-based data structures to Lisp improves it substantially, in my view. To see what I mean, have a look at my FSet functional collections package, which does exactly that:
http://common-lisp.net/project/fset/
Unlike what the OP is suggesting, FSet doesn't expose the internal structure of its trees. Instead it wraps them in a rich set-theoretic interface.
Cons cells give you binary trees. It is possible to encode any arbitrary tree as a binary tree. However, this does not mean that the expression of algorithms over such left child right sibling encoded trees is identical to one over trees with some higher branching factor. So the cons/binary situation will at the least require some additional layers of abstraction to simulate various traversal patterns of the n way tree on the encoding binary tree.
There also physical/efficiency concerns. There's a reason databases use highly branching trees.
And you can create the abstractions necessary to work with such a data structure and add them straight to the regular grammar of the language.
Lisp also has all of the other low-level primitives we're used to when working closer to the machine. Structs, indexed arrays, hashes, etc. If efficiency is a concern, just pick an implementation that compiles to machine code. SBCL is pretty good for this depending on your target platform.
A few people have asked, what is the point? since trees and associative access can just be implemented with lists.
It is the same basic point as having 'let' or 'case' etc. (which can all just be implemented with 'lambda' and 'if' etc.) -- it is just a bit higher-level to program with.
With something that makes programming a little easier (maybe), probably the question ought really be, why shouldn't we have it?
In my way of thinking, the distinction between "library" and "built-in" is very fluid. That is, if you run an executable which automatically loads a library for you, then voila it is now built-in. If nothing else, you can set up your own local environment that you favor. I realize this has portability implications, but keep in mind that not everyone will want "bignums" or "socket operations" or "openssl" available all the time in every program.
For me, the language implications are nil. I would not change a single thing in Fexl to support name-value "trees" versus lists. It's all programmable using the system as it is. The S, C, Y, I, L, R combinators are what they are, forever and ever, world without end. :) And I'm quite sure the same thing is true of Lisp (i.e. just use defun or maybe defmacro). You'll notice that even in the original article the author didn't propose any new syntax. It's all just more functions.
I mean, even in Perl for goodness sake all you have to do is this:
# A map is a list of branches. Each branch is pair(key,data). The data is
# pair(val,map). No two keys in a map have any prefix in common. A key cannot
# be null. If you put a null val at a key, it deletes that key from the map.
sub map_put
{
my $map = shift;
my $key = shift;
my $val = shift;
die if !defined $key || ref($key) ne "" || $key eq "";
die if !defined $val;
if (is_atom($map))
{
return $map if is_null($val);
return pair(pair(atom($key),pair($val,null())), $map);
}
my $branch = left($map);
my $old_key = name(left($branch));
my $cmp = $key cmp $old_key;
if ($cmp == 0)
{
my $sub_map = right(right($branch));
if (is_null($val))
{
return right($map) if is_atom($sub_map);
if (is_atom(right($sub_map)))
{
# Combine singleton map upstairs.
my $top_key = name(left($branch));
my $next_key = name(left(left($sub_map)));
my $new_branch = pair(atom($top_key.$next_key),
right(left($sub_map)));
return pair($new_branch,right($map));
}
}
my $new_branch = pair(left($branch), pair($val,$sub_map));
return pair($new_branch,right($map));
}
if (substr($key,0,1) eq substr($old_key,0,1))
{
my $len_common = len_common_prefix($key,$old_key);
my $key_suffix = substr($key,$len_common);
my $old_key_suffix = substr($old_key,$len_common);
my $common_prefix = substr($key,0,$len_common);
if ($key_suffix eq "")
{
return $map if is_null($val);
my $sub_map =
pair(pair(atom($old_key_suffix),right($branch)),null());
my $new_branch = pair(atom($common_prefix),pair($val,$sub_map));
return pair($new_branch,right($map));
}
if ($old_key_suffix eq "")
{
my $old_sub_map = right(right($branch));
my $new_sub_map = map_put($old_sub_map,$key_suffix,$val);
my $old_val_atom = left(right($branch));
if (name($old_val_atom) eq "")
{
return right($map) if is_atom($new_sub_map);
if (is_atom(right($new_sub_map)))
{
# Combine singleton map upstairs.
# TODO hey wait! This is the same code as above. Unify this.
my $top_key = name(left($branch));
my $next_key = name(left(left($new_sub_map)));
my $new_branch = pair(atom($top_key.$next_key),
right(left($new_sub_map)));
return pair($new_branch,right($map));
}
}
my $new_branch = pair(left($branch),
pair($old_val_atom,$new_sub_map));
return pair($new_branch,right($map));
}
return $map if is_null($val);
my $old_branch = pair(atom($old_key_suffix),right($branch));
my $sub_map = map_put(pair($old_branch,null()),$key_suffix,$val);
my $new_branch = pair(atom($common_prefix),pair(null(),$sub_map));
return pair($new_branch,right($map));
}
if ($cmp < 0)
{
return $map if is_null($val);
return pair(pair(atom($key),pair($val,null())), $map);
}
return pair($branch,map_put(right($map),$key,$val));
}
sub map_get
{
my $map = shift;
my $key = shift;
die if !defined $key || ref($key) ne "" || $key eq "";
return null() if is_atom($map);
my $branch = left($map);
my $old_key = name(left($branch));
my $cmp = $key cmp $old_key;
return left(right($branch)) if $cmp == 0;
if (substr($key,0,1) eq substr($old_key,0,1))
{
my $len_common = len_common_prefix($key,$old_key);
my $key_suffix = substr($key,$len_common);
my $old_key_suffix = substr($old_key,$len_common);
return map_get(right(right($branch)),$key_suffix)
if $old_key_suffix eq "";
return null();
}
return null() if $cmp < 0;
return map_get(right($map),$key);
}
That would be a lot cleaner in Fexl or Lisp, but it's totally do-able in any language.
You provided two things that the original post is a reinvention of. Which is it? Why not read the article enough to decide which it is a reinvention of, then present your thinking in your comment?
No, trees are multidimensional lists. Lists are one-dimensional lists, which are degenerate trees and a simple primitive for the more general (always a loaded term) structure.
* (second (assoc 5 '((1 2) (3 4) (5 6))))
> 6
The lousy performance doesn't matter, most of the time, because you have other data-structures for data, and your compiler generally is only interested in sequential access (constructing a parse tree out of it). The lisp code already being a tree makes parsing fairly simple...
So, I guess my point is that I don't see the point of this?
The only reason to really upgrade to a dictionary would be if you wanted to have non-sequential access to the contents of of your large-scale code-data-structure... generally the lists aren't large enough to merit it. You can do all of the dictionary operations on a list, and they are 'fast enough'. ----
I feel like the rationale here is created by conflating the idea of a cons and a list. I think generally, a list already accomplishes what the author sets out to accomplish.
They are a single step higher level than conses, and give you a hierarchical structure with the possibility for named associations. I don't really see how using something more complicated and memory intensive to build this basic data structure qualifies as an improvement...