Hacker News new | ask | show | jobs
by gohrt 4365 days ago
This made me laugh:

http://www.wolfram.com/mathematica/new-in-10/key-value-assoc...

Version 10 introduces fundamental new constructs called associations. They associate keys with values, allowing highly efficient lookup and updating, even with millions of elements. Associations provide generalizations of symbolically indexed lists, associative arrays, dictionaries, hashmaps, structs, and a variety of other powerful data structures.

2 comments

A very broad generalization of hash tables has been built into the language since day 1 so this feature is mostly for people who just want a structure that they are already used to (or don't want the extra behavior for performance reasons or whatever).

Hash-tables in previous versions of Mathematica:

    >mymap["abc"]=1
    >mymap[2]=2
    >mpmap[hello]=3
    >mymap["abc"]
    1
    >mymap[2]
    2
    >mymap[hello]
    3
Whoa:

    >myDerivative[a_ x_^n_] = n a x^(n - 1)
    >myDerivative[1.1y^3]
    3.3 y^2
More specifically, the the previous hash-table-like method is just assigning constants as values for a function.

    In[1]:= mymap["abc"]=1;
            mymap[2]=2;
            mpmap[hello]=3;
            DownValues[mymap]
    Out[4]= {HoldPattern[mymap[2]]:>2,HoldPattern[mymap[abc]]:>1}
Defining a function is really just a pattern is matched based on arguments, and is replaced with an expression with the arguments substituted into it. And you can set a constant key argument to a value of a constant value — creating the odd built in hash-table-alike which has been around forever.

That said, Association[] looks nice. Presumably it gets a speed improvement by bypassing the pattern matching. And I remember being annoyed by using JSON data in Mathematica structures, this looks nicer. The literal syntax is appreciated too. Actually I'm kind of surprised they didn't choose another weird unicode symbol like 〚, for the literal syntax.

Well, unlike adding and changing definitions for symbols in the symbol table, associations are actually immutable. So you can pass them around as values, mutate them, and the old versions are still available unchanged. That's just not true of using downvalues of symbols.

Also, using definitions on symbols to store key-values is incredibly limiting, because you can't do reverse lookups without getting into very low level (and slow) hackery via DownValues. Or list the keys or values. Or even know how many keys you have.

Associations really will replace basically all uses of symbol downvalues for storing key-value pairs. And Associations have really nice general-purpose functions like GroupBy, Counts, Merge, and so on (mentioned below).

Associations even support function application syntax, so you should be able to drop them in into existing code that expects symbol downvalues as you're using them above.

Associations are the single most significant language-level improvement we've had in many versions. And we've integrated them really nicely across the system -- they're not just some special-purpose data structure.

Yeah, using definitions on symbols to store key-values is indeed incredibly limiting.

Association seems very similar to List: immutable, really good Part syntax for accessing things in deeper levels, etc. I've started to read some of the docs on it too, it looks very nice.

Shades of DEF FN from Applesoft BASIC!
Sounds like a hashtable ... is that what made you laugh ?
I think adding hashtables to a high-level language in 10th release, 26 years after its inception, made him/her laugh.

As a note: IIRC, SWI-Prolog got hash tables in their 7th release, so some fundamental data structure can be added to a language when it's seen as not needed by the language developers.

To be fair, fast immutable hash tables have only been around as a technology since around 2005. (Yes, Mathematica is mostly based on immutability) .Other than internal R&D, Wolfram never add experimental methods to Mathematica.

Of course, its always had lists of rules as a far more general (though much slower) form of the hash table concept.

I assume you mean persistent, not immutable.

    a[x] = 1
    b = a
    a[y] = 2  <---- mutates a
    a[x] = 3  <-- is this legal?
    b[x]    <--- persistent- uh, what is Mathematica's semantics here?
Yes, Associations use the same basic algorithm as Scala and Clojure's hashmaps, and are indeed persistent.

And b[x] there gives 1 as you would expect.

Do you know when that change was put into place? I have been using Mathematica sense version 5 and my intuition was that `b[x] == 3`. Testing in mathematica 8 which I have on hand confirms this.
No, I meant lists of rules like {a->1,b->2}, which capture the same kind of patterns as in the previous example but in an immutable (and copy-on-write) way.
just to clarify, you mean fast immutable hash tables that support insertion right? (Which I agree is hard) Otherwise you can just use a BST or something and it's fast enough for lookups.
Associations aren't just a data structure, they've been designed to fit in a sensible way into the rest of the language, via a principle I call the "central dogma". This means they work in a predictable way with a huge number of existing functions (though we still have more to do).

For example, Associations interact naturally with the hierarchical part-specification language used by Part (http://reference.wolfram.com/language/ref/Part.html):

   In[1]:= people = { 
      <|"name" -> "bob", "age" -> 20, "sex" -> "M"|>, 
      <|"name" -> "sue", "age" -> 25, "sex" -> "F"|>, 
      <|"name" -> "ann", "age" -> 18, "sex" -> "F"|>
   }; 
   
   In[2]:= people[[ All, "age" ]] (* extract list of ages *)
   Out[2]= {20, 25, 18}

   In[3]:= people[[ All, "sex" ]] (* extract list of sexes *)
   Out[3]= {"M", "F", "F"}

   In[4]:= people[[ 2, "age" ]] (* extract age of 2nd person *)
   Out[4]= 25

   In[5]:= people[[ 2, {"age","sex"} ]] (* extract age and sex *)
   Out[5]= <|"age" -> 25, "sex" -> "F"|>
This naturally generalizes to 'indexed tables', in which the outermost list becomes an association, because associations serve double-duty as "structs" and "hash-maps", just like lists are used for both "vectors" and "tuples":

   In[6]:= people = <|
      236234 -> <|"name" -> "bob", "age" -> 20, "sex" -> "M"|>, 
      253456 -> <|"name" -> "sue", "age" -> 25, "sex" -> "F"|>, 
      323442 -> <|"name" -> "ann", "age" -> 18, "sex" -> "F"|>
   |>; 
   
   In[7]:= people[[ All, "age" ]] (* extract association between ID and age *)
   Out[7]= <| 236234 -> 20, 253456 -> 25, 323442 -> 18|>

   In[8]:= people[[ All, "sex" ]] (* extract association between ID and sex *)
   Out[8]= <| 236234 -> "M", 253456 -> "F", 323442 -> "F"|>

   In[9]:= people[[ Key[323442], "age" ]] (* extract age of person with ID 323442 *)
   Out[9]= 18

   (* extract age and sex of person with ID 323442 *)
   In[10]:= people[[ Key[323442], {"age","sex"} ]] 
   Out[10]= <|"age" -> 18, "sex" -> "F"|>

The uniform addressing scheme behind Part (and Extract, Position, etc) is tremendously useful in day-to-day code, because it makes it much easier to write programs as functions that transform potentially complex, hierarchical data in a series of steps.

This is similar in some ways to the ideas behind Haskell's lens library, Clojure's assoc-in and friends, even the schemes used in JQuery and XPath. But it's core to WL.

The semantics of Part are also extended to become a full-fledged query language, as used by Dataset (http://reference.wolfram.com/language/ref/Dataset.html):

   (* load a dataset of passengers of the Titanic *)
   titanic = ExampleData[{"Dataset", "Titanic"}]

   (* produce a histogram of passenger ages *) 
   titanic[Histogram, "age"] 

   (* produce a histograms for 1st class, 2nd class, etc.. *)
   titanic[GroupBy[Key["class"]], Histogram[#, {0,80,4}]&, "age"]  
There are also some really nice functions to work with associations, like the map-reduce-like GroupBy (http://reference.wolfram.com/language/ref/GroupBy.html):

   (* split sentence into list of words *)
   In[16]:= words = StringSplit["it was the best of times it was the worst of times"] 
   Out[16]= {"it", "was", "the", "best", "of", "times", "it", "was", 
      "the", "worst", "of", "times"}

   (* group words that have the same length *) 
   In[17]:= GroupBy[words, StringLength] 
   Out[17]= <|
      2 -> {"it", "of", "it", "of"}, 
      3 -> {"was", "the", "was", "the"}, 
      4 -> {"best"}, 
      5 -> {"times", "worst", "times"}
   |>
   
   (* reduce each group into an association of counts *)
   In[18]:= GroupBy[words, StringLength, Counts] 
   Out[18]= <|
      2 -> <|"it" -> 2, "of" -> 2|>, 
      3 -> <|"was" -> 2, "the" -> 2|>, 
      4 -> <|"best" -> 1|>, 
      5 -> <|"times" -> 2, "worst" -> 1|>
   |>
And Counts and CountsBy (http://reference.wolfram.com/language/ref/CountsBy.html):

   In[21]:= CountsBy[words, StringLength]
   Out[21]= <|2 -> 4, 3 -> 4, 4 -> 1, 5 -> 3|>
And AssociationMap (http://reference.wolfram.com/language/ref/AssociationMap.htm...):

   In[23]:= AssociationMap[WordData[#, "PartsOfSpeech"]&, words]
   Out[23]= <|
      "it" -> {"Pronoun"}, "was" -> {"Verb"}, 
      "the" -> {"Determiner"}, 
      "best" -> {"Noun", "Adjective", "Verb", "Adverb"}, 
      "of" -> {"Preposition"}, "times" -> {"Noun"}, 
      "worst" -> {"Noun", "Adjective", "Verb", "Adverb"}
   |>
Here's some more info about associations: http://reference.wolfram.com/language/guide/Associations.htm...