Hacker News new | ask | show | jobs
by richhickey 3643 days ago
Data like that sentence? Or all of the other sentences in this chat? I find 'data' hard to consider a bad idea in and of itself, i.e. if data == information, records of things known/uttered at a point in time. Could you talk more about data being a bad idea?
2 comments

What is "data" without an interpreter (and when we send "data" somewhere, how can we send it so its meaning is preserved?)
Data without an interpreter is certainly subject to (multiple) interpretation :) For instance, the implications of your sentence weren't clear to me, in spite of it being in English (evidently, not indicated otherwise). Some metadata indicated to me that you said it (should I trust that?), and when. But these seem to be questions of quality of representation/conveyance/provenance (agreed, important) rather than critiques of data as an idea. Yes, there is a notion of sufficiency ('42' isn't data).

Data is an old and fundamental idea. Machine interpretation of un- or under-structured data is fueling a ton of utility for society. None of the inputs to our sensory systems are accompanied by explanations of their meaning. Data - something given, seems the raw material of pretty much everything else interesting, and interpreters are secondary, and perhaps essentially, varied.

There are lots of "old and fundamental" ideas that are not good anymore, if they ever were.

The point here is that you were able to find the interpreter of the sentence and ask a question, but the two were still separated. For important negotiations we don't send telegrams, we send ambassadors.

This is what objects are all about, and it continues to be amazing to me that the real necessities and practical necessities are still not at all understood. Bundling an interpreter for messages doesn't prevent the message from being submitted for other possible interpretations, but there simply has to be a process that can extract signal from noise.

This is particularly germane to your last paragraph. Please think especially hard about what you are taking for granted in your last sentence.

Without the 'idea' of data we couldn't even have a conversation about what interpreters interpret. How could it be a "really bad" idea? Data needn't be accompanied by an interpreter. I'm not saying that interpreters are unimportant/uninteresting, but they are separate. Nor have I said or implied that data is inherently meaningful.

Take a stream of data from a seismometer. The seismometer might just record a stream of numbers. It might put them on a disk. Completely separate from that, some person or process, given the numbers and the provenance alone (these numbers are from a seismometer), might declare "there is an earthquake coming". But no object sent an "earthquake coming" "message". The seismometer doesn't "know" an earthquake is coming (nor does the earth, the source of the 'messages' it records), so it can't send a "message" incorporating that "meaning". There is no negotiation or direct connection between the source and the interpretation.

We will soon be drowning in a world of IoT sensors sending context-or-provenance-tagged but otherwise semantic-free data (necessarily, due to constraints, without accompanying interpreters) whose implications will only be determined by downstream statistical processing, aggregation etc, not semantic-rich messaging.

If you meant to convey "data alone makes for weak messages/ambassadors", well ok. But richer messages will just bottom out at more data (context metadata, semantic tagging, all more data) Ditto, as someone else said, any accompanying interpreter (e.g. bytecode? - more data needing interpretation/execution). Data remains a perfectly useful and more fundamental idea than "message". In any case, I thought we were talking about data, not objects. I don't think there is a conflict between these ideas.

2nd Paragraph: How do they know they are even bits? How do they know the bits are supposed to be numbers? What kind of numbers? Relating to what?

Etc

It contravenes the common and historical use of the word 'data' to imply undifferentiated bits/scribbles. It means facts/observations/measurements/information and you must at least grant it sufficient formatting and metadata to satisfy that definition. The fact that most data requires some human involvement for interpretation (e.g. pointing the right program at the right data) in no way negates its utility (we've learned a lot about the universe by recording data and analyzing it over the centuries), even though it may be insufficient for some bootstrapping system you envision.
Isn't the interpreter code itself data in the sense that it has no meaning without something (a machine) to run it? How do you avoid having to send an interpreter for the interpreter and so on?
Yes, so think about how to make this work "nicely" in an Intergalactic Network ...
It can't be turtles all the way down, so maybe set theory?
I think object is a very powerful idea to wrap "local" context. But in a network (communication) environment, it is still challenging to handle "remote" context with object. That is why we have APIs and serialization/deserialization overhead.

In the ideal homogeneous world of smalltalk, it is a less issue. But if you want a Windows machine to talk to a Unix, the remote context becomes an issue.

In principle we can send a Windows VM along with the message from Windows and a Unix VM (docker?) with a message from Unix, if that is a solution.

This is why "the objects of the future" have to be ambassadors that can negotiate with other objects they've never seen.

Think about this as one of the consequences of massive scaling ...

Along this line of logic, perhaps the future of AI is not "machine learning from big data" (a lot of buzz words) but computers that generate runtime interpreters for new contexts.
Sounds pretty much like the problem of establishing contact with an alien civilization. Definitely set theory, prime numbers, arithmetic and so on... I guess at some point, objects will be equipped with general intelligence for such negotiations if they are to be true digital ambassadors!
It's hard for me to grasp what this negotiation would look like. Particularly with objects that haven't encountered each other. It just seems like such a huge problem.

I don't really know anything at all about microbiology, but maybe climbing the ladder of abstraction to small insects like ants. There is clearly negotiation and communication happening there, but I have to think it's pretty well bounded. Even if one ant encountered another ant, and needed to communicate where food was, it's with a fixed set of semantics that are already understood by both parties.

Or with honeybees, doing the communication dance. I have no idea if the communication goes beyond "food here" or if it's "we need to decide who to send out."

It seems like you have to have learning in the object to really negotiate with something it hasn't encountered before. Maybe I'm making things too hard.

Maybe "can we communicate" is the first negotiation, and if not, give up.

Alan, what is your view on Olive Executable Archive ?https://olivearchive.org/
The Internet Archive (http://archive.org) is doing the same thing. They have old software stored that you can run in online emulators. I only wish they had instructions for how to use the emulators. The old keyboards and controllers are not like today's.
Their larger goals are important.
>Please think especially hard about what you are taking for granted in your last sentence.

Any Meaning can only be the Interpretation of a Model/Signal?

Information in "entropy" sense is objective and meaningless. Meaning only exists within a context. If we think "data" represent information, "interpreters" bring us context and therefore meaning.
Thank you - I was beginning to wonder if anyone in this conversation understood this. It is really the key to meaningfully (!!) move forward in this stuff.
The more meaning you pack into a message, the harder the message is to unpack.

So there's this inherent tradeoff between "easy to process" and "expressive" -- and I imagine deciding which side you want to lean toward depends on the context.

Check this out for a practical example: https://www.practicingruby.com/articles/information-anatomy

(not a Ruby article, but instead about essential structure of messages, loosely inspired by ideas in Gödel, Escher, Bach)

So the idea is to always send the interpreter, along with the data? They should always travel together?

Interesting. But, practically, the interpreter would need to be written in such a way that it works on all target systems. The world isn't set up for that, although it should be.

Hm, I now realize your point about HTML being idiotic. It should be a description, along with instructions for parsing and displaying it (?)

TCP/IP is "written in such a way that it works on all target systems". This partially worked because it was early, partly because it is small and simple, partly because it doesn't try to define structures on the actual messages, but only minimal ones on the "envelopes". And partly because of the "/" which does not force a single theory.

This -- and the Parc PUP "internet" which preceded it and influenced it -- are examples of trying to organize things so that modules can interact universally with minimal assumptions on both sides.

The next step -- of organizing a minimal basis for inter-meanings -- not just internetworking -- was being thought about heavily in the 70s while the communications systems ideas were being worked on, but was quite to the side, and not mature enough to be made part of the apparatus when "Flag Day" happened in 1983.

What is the minimal "stuff" that could be part of the "TCP/IP" apparatus that could allow "meanings" to be sent, not just bits -- and what assumptions need to be made on the receiving end to guarantee the safety of a transmitted meaning?

Would some kind of IDL not be enough to allow meanings to be sent?
Now it's to late to fix.
I don't think it's too late, but it would require fairly large changes in perspective in the general computing community about computing, about scaling, about visions and goals.
Data, and the entirety of human understanding and knowledge derived from recording, measurement and analysis of data, predates computing, so I don't see the relevance of these recent, programming-centric notions in a discussion of its value.
Wouldn't Mr. Kay say that it is education that builds the continuity of the entirety of human understanding? Greek philosophy and astronomy survived in the Muslim world and not in the European, though both possessed plenty of texts, because only the former had an education system that could bootstrap a mind to think in a way capable of understanding and adding to the data. Ultimately, every piece of data is reliant on each generation of humans equipping enough of their children with the mindset capable to use it intelligently.

The value of data is determined by the intelligence of those interpreting it, not those who recorded it.

Of course, this dynamic is sometimes positive. The Babylonians kept excellent astronomical records though apparently making little theoretical advance in understanding them. Greeks with an excellent grasp of geometry put that data to much better use very quickly. But if they had had to wait to gather the data themselves, one can imagine them waiting a long time.

This kind of gets into philosophy, but a metaphor I came up with for thinking about this (another phrase for it is "thought experiment") is:

If I speak something to a rock, what is it to the rock? Is it "signal," or "data"?

Making the concept a little more interesting, what if I resonate the rock with a sound frequency? What is that to the rock? Is that "signal," or "data"?

Up until the Rosetta Stone was found, Egyptian hieroglyphs were indecipherable. Could data be gathered from them, nevertheless? Sure. Researchers could determine what pigments were used, and/or what tools were used to create them, but they couldn't understand the messages. It wasn't "data" up to that point. It was "noise."

I hope I am not giving the impression that I am a postmodernist who is out here saying, "Data is meaningless." That's not what I'm saying. I am saying meaning is not self-evident from signal. The concept of data requires the ability to interpret signal for meaning to be acquired.

Computing has existed for thousands of years. We just have machines do some of it now.
What if "data" is a really great idea?
Your blog looks very interesting. You should share some links of it here on hackernews!
Thanks. I share it wherever I think it will add to the discussion.