| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alankay 3652 days ago
	What is "data" without an interpreter (and when we send "data" somewhere, how can we send it so its meaning is preserved?)

3 comments

richhickey 3652 days ago

Data without an interpreter is certainly subject to (multiple) interpretation :) For instance, the implications of your sentence weren't clear to me, in spite of it being in English (evidently, not indicated otherwise). Some metadata indicated to me that you said it (should I trust that?), and when. But these seem to be questions of quality of representation/conveyance/provenance (agreed, important) rather than critiques of data as an idea. Yes, there is a notion of sufficiency ('42' isn't data).

Data is an old and fundamental idea. Machine interpretation of un- or under-structured data is fueling a ton of utility for society. None of the inputs to our sensory systems are accompanied by explanations of their meaning. Data - something given, seems the raw material of pretty much everything else interesting, and interpreters are secondary, and perhaps essentially, varied.

alankay 3652 days ago

There are lots of "old and fundamental" ideas that are not good anymore, if they ever were.

The point here is that you were able to find the interpreter of the sentence and ask a question, but the two were still separated. For important negotiations we don't send telegrams, we send ambassadors.

This is what objects are all about, and it continues to be amazing to me that the real necessities and practical necessities are still not at all understood. Bundling an interpreter for messages doesn't prevent the message from being submitted for other possible interpretations, but there simply has to be a process that can extract signal from noise.

This is particularly germane to your last paragraph. Please think especially hard about what you are taking for granted in your last sentence.

richhickey 3652 days ago

Without the 'idea' of data we couldn't even have a conversation about what interpreters interpret. How could it be a "really bad" idea? Data needn't be accompanied by an interpreter. I'm not saying that interpreters are unimportant/uninteresting, but they are separate. Nor have I said or implied that data is inherently meaningful.

Take a stream of data from a seismometer. The seismometer might just record a stream of numbers. It might put them on a disk. Completely separate from that, some person or process, given the numbers and the provenance alone (these numbers are from a seismometer), might declare "there is an earthquake coming". But no object sent an "earthquake coming" "message". The seismometer doesn't "know" an earthquake is coming (nor does the earth, the source of the 'messages' it records), so it can't send a "message" incorporating that "meaning". There is no negotiation or direct connection between the source and the interpretation.

We will soon be drowning in a world of IoT sensors sending context-or-provenance-tagged but otherwise semantic-free data (necessarily, due to constraints, without accompanying interpreters) whose implications will only be determined by downstream statistical processing, aggregation etc, not semantic-rich messaging.

If you meant to convey "data alone makes for weak messages/ambassadors", well ok. But richer messages will just bottom out at more data (context metadata, semantic tagging, all more data) Ditto, as someone else said, any accompanying interpreter (e.g. bytecode? - more data needing interpretation/execution). Data remains a perfectly useful and more fundamental idea than "message". In any case, I thought we were talking about data, not objects. I don't think there is a conflict between these ideas.

alankay 3651 days ago

2nd Paragraph: How do they know they are even bits? How do they know the bits are supposed to be numbers? What kind of numbers? Relating to what?

Etc

richhickey 3651 days ago

It contravenes the common and historical use of the word 'data' to imply undifferentiated bits/scribbles. It means facts/observations/measurements/information and you must at least grant it sufficient formatting and metadata to satisfy that definition. The fact that most data requires some human involvement for interpretation (e.g. pointing the right program at the right data) in no way negates its utility (we've learned a lot about the universe by recording data and analyzing it over the centuries), even though it may be insufficient for some bootstrapping system you envision.

mmiller 3650 days ago

I think what Alan was getting at is that what you see as "data" is in fact, at its basis, just signal, and only signal; a wave pattern, for example, but even calling it a "wave pattern" suggests interpretation. What I think he's trying to get across is there is a phenomenon being generated by something, but it requires something else--an interpreter--to even consider it "data" in the first place. As you said, there are multiple ways to interpret that phenomenon, but considering "data" as irreducible misses that point, because the concept of data requires an interpreter to even call it that. Its very existence as a concept from a signal presupposes an interpretation. And I think what he might have been getting at is, "Let's make that relationship explicit." Don't impose a single interpretation on signal by making "data" irreducible. Expose the interpretation by making it explicit, along with the signal, in how one might design a system that persists, processes, and transmits data.

jonathanlocke 3650 days ago

I think in the Science of Process that is being related as a desirable goal, everything would necessarily be a dynamic object (or perhaps something similar to this but fuzzier or more relational or different in some other way, but definitely dynamic) because data by itself is static while the world itself is not.

jsprogrammer 3650 days ago

Your selection of data is arbitrary.

Not only is your perception based on an interpreter, but how can you be sure that you were even given all of the relevant bits? Or, even what the bits really meant/are?

panic 3652 days ago

Isn't the interpreter code itself data in the sense that it has no meaning without something (a machine) to run it? How do you avoid having to send an interpreter for the interpreter and so on?

alankay 3651 days ago

Yes, so think about how to make this work "nicely" in an Intergalactic Network ...

jonathanlocke 3650 days ago

It can't be turtles all the way down, so maybe set theory?

alankay 3650 days ago

A good question isn't it?

For parallel ideas and situation, take a look at Lincos https://en.wikipedia.org/wiki/Lincos_(artificial_language)

ontouchstart 3652 days ago

I think object is a very powerful idea to wrap "local" context. But in a network (communication) environment, it is still challenging to handle "remote" context with object. That is why we have APIs and serialization/deserialization overhead.

In the ideal homogeneous world of smalltalk, it is a less issue. But if you want a Windows machine to talk to a Unix, the remote context becomes an issue.

In principle we can send a Windows VM along with the message from Windows and a Unix VM (docker?) with a message from Unix, if that is a solution.

alankay 3652 days ago

This is why "the objects of the future" have to be ambassadors that can negotiate with other objects they've never seen.

Think about this as one of the consequences of massive scaling ...

ontouchstart 3652 days ago

Along this line of logic, perhaps the future of AI is not "machine learning from big data" (a lot of buzz words) but computers that generate runtime interpreters for new contexts.

alankay 3651 days ago

It's not "Big Data" but "Big Meaning"

yanivt 3652 days ago

When high bandwidth communication is omnipresent, is "portability" of the interpreter really something to optimize for?

jonathanlocke 3650 days ago

Sounds pretty much like the problem of establishing contact with an alien civilization. Definitely set theory, prime numbers, arithmetic and so on... I guess at some point, objects will be equipped with general intelligence for such negotiations if they are to be true digital ambassadors!

alankay 3650 days ago

Yes, look at Lincos https://en.wikipedia.org/wiki/Lincos_(artificial_language)

DigitalJack 3651 days ago

It's hard for me to grasp what this negotiation would look like. Particularly with objects that haven't encountered each other. It just seems like such a huge problem.

I don't really know anything at all about microbiology, but maybe climbing the ladder of abstraction to small insects like ants. There is clearly negotiation and communication happening there, but I have to think it's pretty well bounded. Even if one ant encountered another ant, and needed to communicate where food was, it's with a fixed set of semantics that are already understood by both parties.

Or with honeybees, doing the communication dance. I have no idea if the communication goes beyond "food here" or if it's "we need to decide who to send out."

It seems like you have to have learning in the object to really negotiate with something it hasn't encountered before. Maybe I'm making things too hard.

Maybe "can we communicate" is the first negotiation, and if not, give up.

alankay 3651 days ago

It is worth thinking of an analogy to TCP/IP -- what is the smallest thing that could be universal that will allow everything else to happen?

yawaramin 3650 days ago

Well, there's the old Component Object Model and cousins ... under this model an object a encountering a new object b will, essentially, ask 'I need this service performed, can you perform it for me?' If b can perform the service, a makes use of it; if not, not.

Another technique that occurs to me is from type theory ... here, instead of objects we'll talk in terms of values and functions, which have types. So e.g. a function a encountering a new function b will examine b's type and thereby figure out if it can/should call it or not. E.g., b might be called toJson and have type (in Haskell notation) ToJson a => a -> Text, so the function a knows that if it can give toJson any value which has a ToJson typeclass instance, it'll get back a Text value, or in other words toJson is a JSON encoder function, and thus it may want to call it.

ontouchstart 3652 days ago

Alan, what is your view on Olive Executable Archive ?https://olivearchive.org/

mmiller 3650 days ago

The Internet Archive (http://archive.org) is doing the same thing. They have old software stored that you can run in online emulators. I only wish they had instructions for how to use the emulators. The old keyboards and controllers are not like today's.

ontouchstart 3650 days ago

Here is another example: https://news.ycombinator.com/item?id=11155203

alankay 3651 days ago

Their larger goals are important.

ontouchstart 3651 days ago

Do you think they are on the right path to their larger goals?

valarauca1 3650 days ago

>Please think especially hard about what you are taking for granted in your last sentence.

Any Meaning can only be the Interpretation of a Model/Signal?

ontouchstart 3652 days ago

Information in "entropy" sense is objective and meaningless. Meaning only exists within a context. If we think "data" represent information, "interpreters" bring us context and therefore meaning.

jsa-aerial 3650 days ago

Thank you - I was beginning to wonder if anyone in this conversation understood this. It is really the key to meaningfully (!!) move forward in this stuff.

sandal 3650 days ago

The more meaning you pack into a message, the harder the message is to unpack.

So there's this inherent tradeoff between "easy to process" and "expressive" -- and I imagine deciding which side you want to lean toward depends on the context.

Check this out for a practical example: https://www.practicingruby.com/articles/information-anatomy

(not a Ruby article, but instead about essential structure of messages, loosely inspired by ideas in Gödel, Escher, Bach)

olantonan 3650 days ago

So the idea is to always send the interpreter, along with the data? They should always travel together?

Interesting. But, practically, the interpreter would need to be written in such a way that it works on all target systems. The world isn't set up for that, although it should be.

Hm, I now realize your point about HTML being idiotic. It should be a description, along with instructions for parsing and displaying it (?)

alankay 3650 days ago

TCP/IP is "written in such a way that it works on all target systems". This partially worked because it was early, partly because it is small and simple, partly because it doesn't try to define structures on the actual messages, but only minimal ones on the "envelopes". And partly because of the "/" which does not force a single theory.

This -- and the Parc PUP "internet" which preceded it and influenced it -- are examples of trying to organize things so that modules can interact universally with minimal assumptions on both sides.

The next step -- of organizing a minimal basis for inter-meanings -- not just internetworking -- was being thought about heavily in the 70s while the communications systems ideas were being worked on, but was quite to the side, and not mature enough to be made part of the apparatus when "Flag Day" happened in 1983.

What is the minimal "stuff" that could be part of the "TCP/IP" apparatus that could allow "meanings" to be sent, not just bits -- and what assumptions need to be made on the receiving end to guarantee the safety of a transmitted meaning?

solidsnack9000 3649 days ago

Would some kind of IDL not be enough to allow meanings to be sent?

olantonan 3650 days ago

Now it's to late to fix.

alankay 3650 days ago

I don't think it's too late, but it would require fairly large changes in perspective in the general computing community about computing, about scaling, about visions and goals.