Hacker News new | ask | show | jobs
by mmiller 3191 days ago
I think your analogy is going in a better direction. I had the same idea after taking a look at Squeak for a while. What would need to be added to your analogy is a notion of design. You see, in Smalltalk, for example, your programming takes place in the messages. So, even the daemons (which, for the sake of argument, we could think of as analogies to objects), which would be the senders and receivers of messages would be made out of the same stuff, not C/C++. So, this is a pretty dramatic departure from the way Unix operates. What this should suggest is that the semantic connection between objects is late-bound. Think of them as servers.

Secondly, in the typical Smalltalk implementation, there is still compilation, but it's incremental. You compile expressions and methods, not the whole body of source code for the whole system. What's really different about it is since the semantic actions are late-bound, you can even compile something while a thread is executing through the code you're compiling. So, you get nearly instant feedback on changes for free. Bret Victor's notion of programming environments blurs the axes you're talking about even more, so that you don't have to do two steps to see your change, while a thread is running (edit, then compile). You can see the effect of the change the moment you change an element, such as the upper bound of an iterative loop. To make it even more dynamic, he tied GUI elements (sliders) to such things as the loop parameters, so that you don't have to laboriously type the values to try them out. You can just change the slider, and see the effects of the change in a loop's range very quickly, such that it almost feels like you're using a tool-based design space, rather than programming.

I don't know how this was done in ST-78, but in ST-80, at least in accounts I've heard from people who've used versions of it, and in the version of Squeak I've used from time to time, the source code is not technically stored with the class object, though the system keeps references to the appropriate pieces of source code, and their revisions, mapping them to the classes in the system, so that when you tell the system you want to look at the source code for a class, it pulls up the appropriate version of the code in an editor.

Source code is stored in a separate file, and Smalltalk has a version control system that allows reviewing of source code edits, and reversion of changes (undo). The class object typically exists in the Smalltalk image as compiled code.

There are many things that are different about this vs. what you typically practice in CS, but addressing your point about data, in OOP, objects are supposed to take the place of data. In OOP, data contains its own semantics. It inverts the typical notion of procedures acting on data. Instead, data contains procedures. It's an active "live" part of the programming that you do. So, yes, data is persisted, along with its procedures.

A simple example in Smalltalk is: 2 + 2. If we analyze what's going on, the "2"'s are the pieces of data, the objects/servers, and "+" is used to reference a method in one of the data instances, but it doesn't stop there. The "2" objects communicate with each other to do the addition, getting the result: 4. As you can tell implicitly, "2 + 2" is also source code, to generate the semantic actions that generate the result.

2 comments

"More or less" ... I can see that it's hard after decades of "data-centric" perspectives to think in terms of "computers" rather than "data", and about semantics rather than pragmatics. It's not "data contains procedures" but that objects are (a) semantically computers, and are impervious to attack from the outside (a computer has to let an attack happen via its own internal programming), and (b) what's inside can be anything that is able to deal with messages in a reasonable way. In Smalltalk, these are more objects (there are only objects in Smalltalk). The way the internals of typical Smalltalk objects are organized could be done better (we used several schemes, but all of them had variables and methods (which were also objects).

So "2" is not "data" in Smalltalk (unfortunately, it is in Java, etc.)

We had planned that the interior of objects should be an "address space of objects" because this is a better and more recursive way to do modularization, and to allow a different inter-viewing scheme (we eventually did some of this for the Etoys system for children about 20 years ago). But the physical computers at Parc were tiny, and the code we could run was tiny (the whole system in the Ted Nelson demo video was a little over 10,000 lines of code for everything). So we stayed with our top priority: to make a real and highly interactive system that was very comprehensive about what it and the user could do together.

Taking your feedback into consideration, I had the thought that it would be more accurate to talk about things like "2" in an OOP context as symbols with semantics (which provide meaning to them), not data, since "data" connotes more a collection of inputs/quantities, where we may be able to attach a meaning to it, or not, and that wasn't what I was going after. I was going after a relationship between information and semantics that can be associated with it, but trying to provide a transition point from the idea of data structures to the idea of objects, for someone just learning about OOP. Doing a sleight of hand may not do the trick.

My starting point was to use a very interesting concept, when I encountered it, in SICP, where it discussed using procedures to emulate data, and everything that can be done with it. It seemed to help explain for the first time what "code is data" meant. It illustrated the inversion I was talking about:

https://tekkie.wordpress.com/2010/07/05/sicp-what-is-meant-b...

"In 2.1.3 it questions our notion of “data”, though it starts from a higher level than most people would. It asserts that data can be thought of as this constructor and set of selectors, so long as a governing principle is applied which says a relationship must exist between the selectors such that the abstraction operates the same way as the actual concept would." It went on to illustrate how in Lisp/Scheme you could use functions to emulate operations like "cons", "car", and "cdr", completely in procedural space, without using data structures at all.

This is what I illustrate with "2 + 2", and such, that code is doing everything in this operation, in OOP. It's not a procedure applied to two operands, even though that's how it looks on the surface.

Yes, the SICP follows "simulate data" ideas much further back in the past, including the B5000 computer, and especially the OOP systems I did. But the big realization is that there are very few things that are helpful when they are passive, and the non-passive parts are the unique gift of computing. The question is not whether ideas from the past can be simulated (easy to see they can be if the building blocks are whole computers) but what do we "mean by 'meaning' "?

Good answers to this are out of the scope of HN, but we should be able to imagine -processes- moving forward in both our time and in simulated time that can answer our questions in a consistent way, and can be set to accomplish goals in a reasonable way.

I was using "data" in the spirit of a saying I heard many years ago in CS, that, "Data is code, and code is data." It seems that people in CS are still familiar with this phrase. I was focusing on the latter part of that phrase. I was trying to answer the question that I think is often implied once you start talking to people about real OOP, "What about data?" I almost don't like the term "data" when talking about this, because as you say, it gets one away from the focus on semantics, but whenever you're talking to people in the computing field, such as it is, I think this question is unavoidable, because people are used to thinking of code and information as separate, hence the notion of data structures. People need a way to translate in their minds between what they've done with information before, and what it can be. So, I used the term "data" to talk about "literal objects" (like "2", or other kinds of input), but I was using the description of "processors" (ie. computers), "containing procedures," which can also be thought of as "operators."

I think the idea of an "inversion" is quite apt, because as you've said before, the idea of data structures is that you have procedures acting on data. With real objects, you still have the same essential elements in programming, the same stuff to deal with, but the kinds of things programmers typically think about as "data" are objects/computers in OOP, with intrinsic semantics. So, you're still dealing with things like "2", just as procedures acting on data do, but instead of it being just a "dead" symbol, that can't do anything, "2" has semantics associated with an interface. It's a computer.

> We had planned that the interior of objects should be an "address space of objects" because this is a better and more recursive way to do modularization

Something that nags me in the back of my mind is that messages are not just any object, they always have the selector attached. Why not let objects handle any other object as a message? Is this what you mean by the above?

Thinking about the biological analogy (maybe taking it too far...): the system of cells is distinct from the system of proteins inside the cells and going up the layers we have the systems of creatures. So the way proteins interact is different from how cells interact, etc. but each system derives its distinct behaviors from the lower ones. Also, the messages are typically not the entities themselves but other lower level stuff (cells communicate using signals that are not cells). So in a large scale OO system we might see layers of objects emerge. Or maybe we need a new model here, not sure.

Take a look at the first implemented Smalltalk (-72). It implemented objects internally as a "receive the message" mechanism -- a kind of quick parser -- and didn't have dedicated selectors. (You can find "The Early History of Smalltalk" via Google to see more.)

This made the first Smalltalk "automatically extensible" in the dimensions of form, meaning, and pragmatics.

When Xerox didn't come through with a replacement for the Alto we (and others at Parc) had to optimize for the next phases, and this led to the compromise of Smalltalk-76 (and the succeeding Smalltalks). Dan Ingalls chose the most common patterns that had proved useful and made a fixed syntax that still allowed some extension via keywords. This also eliminated an ambiguity problem, and the whole thing on the same machine was about 180 times faster.

I like your biological thinking. As a former molecular biologist I was aware of the vast many orders of magnitude differences in scale between biology and computing. (A typical mammalian cell will have billions of molecules, etc. A typical human will have 10 Trillion cells with their own DNA and many more in terms of microbes, etc.) What I chose was the "Cambrian Revolution Recursively": that cells could work together in larger architecture from biology, and that you can make the interiors of things at the same organization of the wholes in computing because of references -- you don't have to copy. So just "everything made from cells, including cells", and messages made from cells, etc.

Some ideas you might find interesting are in an article I wrote in 1984 -- called "Computer Software" -- for a special issue of Scientific American on "Software". This talks about the subject in general, and looks to the possibility of "tissue programming" etc.

I should have mentioned a few other things for the later Smalltalks. First, selectors are just objects. Second, you could use the automatic "message not understood" mechanism to field an unrecognized object. I think I'd do this by adding a method called "any" and letting it take care of arbitrary unknown objects ...
> adding a method called "any"

Right, I understand there are ways to do this with methods but my question was more about the purity aspect, which you already addressed above.

A selector is an object -- so that is pure -- and its use is a convention of the messaging, and the message itself is one object, that is an instance of Class message.

What's fun is that every Smalltalk contained the tools to make their successors while still running themselves. In other words, we can modify pretty much anything in Smalltalk on the fly if we choose to dip into the "meta" parts of it, which are also running. In Smalltalk-72, a message send was just a "notify" to the receiver that there was a message, plus a reference to the whole message. The receiver did the actual work of looking at it, interpreting it, etc.

This is quite possible to make happen in the more modern Smalltalks, and would even be an interesting exercise for deep Smalltalkers.

I've seen some of Bret Victor's talks but your description made it click!

> the source code is not technically stored with the class object

This is more of an optimization choice, perhaps? Given the link is maintained, it might be OK to say the source code and the bytecode form are two forms representing the same object?

Nothing is technically stored with the class object (everything is made from objects related by object-references). Semantically everything is "together". Pragmatically, things are where they need to be for particular implementations. In the early versions of Smalltalk on small machines it was convenient to cache the code in a separate file (but also every object -- e.g. in Smalltalk-76) was automatically swappable to the disk -- just another part of the pragmatics of making a very comprehensive system run on a tiny piece of hardware.
"This is more of an optimization choice, perhaps? Given the link is maintained, it might be OK to say the source code and the bytecode form are two forms representing the same object?"

Sure. :)