| kernighan said in the interview > The main idea in Awk was associative arrays, which were newish at the time, but which now show up in most languages either as library functions (hashmaps in Java or C++) or directly in the language (dictionaries in Perl and Python). Associative arrays are a very powerful construct, and can be used to simulate lots of other data structures. awk was released in 01979, and the authors published this paper in sp&e that year: https://plan9.io/sources/contrib/steve/other-docs/awk.pdf but you see this report version is dated september 01978. but i don't think the report was widely circulated until the next year, when it was included in 7th edition unix as /usr/doc/awk (sudo apt install groff; groff -ms -Tutf8 v7/usr/doc/awk | less -r). it explains: > Array elements may be named by non-numeric
values, which gives awk a capability rather like the
associative memory of Snobol tables. (...) There is an alternate form of the for statement
which is suited for accessing the elements of an associative array: this pdf has evidently been retypeset from the troff sources from the open-source 7th edition release, but without the correct bibliographic database, so the references are missing. a comment in the troff source says: > ....It supersedes TM-77-1271-5, dated September 8, 1977. but possibly that reference is inaccurate python goes beyond merely having dicts 'directly in the language'. python's primary data structure is the dict; among other things, it uses dicts for modules, (most) class instances, associating methods with classes, the locals() user interface to the local-variable namespace, and passing keyword arguments to functions and methods. that is, it uses associative arrays to simulate lots of other data structures, as you are obliged to do in awk, lua, and js. so where did python get dicts? python got dicts (and tuples) from abc, a teaching language which wikipedia claims was started in 01987, 8 years after awk's release, and added conventional arrays (lists) back in. the five data types in abc https://homepages.cwi.nl/~steven/abc/language.html are numbers, strings, compounds (called tuples in ml), lists (really multisets because they're implicitly sorted), and tables (dictionaries), which are awk's 'associative arrays'—and, as in awk, js, lua, and tcl, they're used to provide the functionality of conventional arrays as well however, lambert meertens credits the use of tables in abc to jack schwartz's setl https://inference-review.com/article/the-origins-of-python rather than to awk. he says of the addition of tables to b (the early name for abc, not to be confused with the b that was an earlier version of c) > Having coded a few algorithms in SETL, I had experienced its power firsthand—a power that stemmed entirely from its high-level inbuilt data types. Particularly powerful were sets and maps, also known as “associative arrays,” containing data that can be indexed not only by consecutive integers but by arbitrary values. A programmer could introduce a simple database of quotations named whosaid, in which the value ”Descartes” could be stored in the location whosaid[”cogito ergo sum”]. These high-level types made it possible to express algorithms that required many steps in B1 using just a few steps in SETL. In a clear violation of the Fair-Expectation Rule, B1 allowed only integers as array indices. This design decision had been driven by fear: we had been concerned that aiming too high would make our language unimplementable on the small personal computers that were starting to appear on the market. But Dewar, in particular, convinced me that this meant we were designing for the past, not the future. This led us to redesign the system of data types for our beginners’ language. This time we used only the criteria of ease of learning and ease of use to select candidate systems. The winner turned out to be remarkably similar to the data type system of SETL. The set of possible data type systems to choose from was very large, and to make the process more manageable I had written a program to select the competitive (Pareto-optimal) candidate systems. Interestingly, but quite incidentally, that selection program itself was written in SETL. The winning type system became that of B2, and remained unchanged in the final iteration, released in 1985 under the name “ABC.” 'associative arrays', of course, is the term used by awk this story of adding associative arrays to abc only for b2 is somewhat complicated by the fact that the version of b (b1?) in meertens's 01981 'draft proposal for the b programming language' https://ir.cwi.nl/pub/16732 already includes tables, three years after the release of awk as part of 7th edition; p. 6 (9/91) says, > Tables are somewhat like dictionaries. A short English-Dutch dictionary (not sufficient to maintain a conversation) might be (...) Table entries, like entries in a dictionary, consist of two parts. The first part is called the key , and the second part is called the associate. All keys must be the same type of value, and similarly for associates. A table may be written thus: {[’I’]: 1; [’V’]: 5; [’X’]: 10}. > If this table has been put in a target roman, then roman[’X’] = 10. note that this is also awk's syntax for indexing an associative array, though it doesn't have a syntax for writing one down. (to be continued; hn says, 'that comment included too many facts') |
Wow, now that I'd like to see.
Awesome and informative comments, cheers!