Hacker News new | ask | show | jobs
by optymizer 2205 days ago
The topic of how we as developers implement solutions in code has been on my mind for years.

The one insightful idea I found in this essay is that coding is a lossy one-way operation, from which you cannot fully derive the original idea or the 'theory'. That seems similar to losing information when compiling source code, making it impossible to restore the exact source code from its machine code representation.

So if we work backwards, it's: machine code (bits) -> source code (text) -> idea/solution (human thought?)

Despite losing some information, machine and source code have interesting properties, such as being able to copy them easily, transpile to different format, etc.

What I'd like to ask the HN brain is if anyone can think of another way to express a higher level thought other than language? In his essay, Naur implies that there is no such thing. I wonder if we had made any progress on that front in the 35 years that have elapsed since this essay was written.

The only thing I can think of is something like UML, which has tons of diagram types for structural and behavioral properties of a system, but I've always found it hard to 'see' the real idea they're trying to describe, in the same way how I find it hard to imagine a 4D object by looking at its 3D projections. With enough effort its certainly doable, but I wouldn't say the process is intuitive or easy, so to me, diagrams are like projections of an idea from different points of view, but how do we encode the idea/thought/theory itself?

What is it about language and apprenticeship that makes conveying ideas or theories possible? I view this process as an inefficient way of serializing an idea and transmitting it over voice to another person, who has to unserialize the sounds, convert them to words, then they have to create the associations in their brain based on the meaning of those words, and then probe into the correctness of the associations by asking clarifying questions.

Is this really the best we can do in 2020? How are other fields conveying complex abstract notions and ideas?

10 comments

You're butting your head against the fundamental paradox of communication: in order to communicate an idea that's in your head to somebody else, you have to encode it in a way that the other person will recognise and decode; that is, you need to already have some shared context. However, if you have a new idea then by definition it can't be part of the shared context, so it can't be communicated.

We get around this by invoking combinations of existing ideas and hoping that the recipient puts them together in more or less the right way: we might say "a leopard sits in the tree to your left", invoking the existing ideas "leopard", "tree", "to your left" and "sits", which can be combined in the obvious way. UML, musical notation, mathematics... all these are variations on "language" in the sense that they have a vocabulary of existing ideas, and a grammar of natural ways to combine them, and so you can bootstrap ideas in another person's brain by giving them pieces they already know and hoping they can assemble the idea correctly.

Language is messy and non-portable and unreliable, and it is exactly those properties which allow it to convey novel ideas from one person to another.

We get "around" this by ostentation: words in a spoken (natural) language refer to the world.

I point to examples of trees and say "tree", etc.

"New" ideas are acquired by example -- language is not a closed system.

Which is harder than it sounds if you are trying to do it from scratch. Even pointing is a part of language, and trying to convey that concept is far from trivial. I think a big part of language learning in children involve the child seeing other people react to language and imitates. E.g., father points and mother gaze in the direction of finger, child follows mother's gaze.
To further this idea, dogs intuit the meanings of commands, some more easily than others, but teaching a dog to understand a human pointing at something is hard. My own dogs seem to interpret it as "just keep looking"
Ryle’s use of the term “theory,” which Naur adopts here, is a bit counter-intuitive; it’s really referring to a kind of operational knowledge.

One of Ryle’s central points is that there’s a categorical difference between being able to perform a skill yourself and knowing facts about how that skill is performed. Books, language, and observing other practitioners can only provide the latter kind of knowledge, but the former is what’s actually valuable.

Learning how to do something can only be achieved through practice. This practice can be guided and improved by rote learning, but cannot be replaced by it. Also, no two practitioners approach a skill in exactly the same way: As only rote knowledge can be transferred between people, each person builds their own structure of operational knowledge based on their particular experiences.

Naur’s argument here, then, is that this operational knowledge is dominated not by knowledge of building software in general, but instead by the understanding of how a particular piece of software functions internally and interacts with external factors. These external factors are both concrete, like protocol specifications, and abstract, like the competitive landscape the company is operating in.

Further, Naur argues that the primary value in developing a piece of software isn’t the software itself, but the expertise that the programming team had to develop in order to produce it. In this view, dismissing the programmers and keeping the software is a grave mistake that will surface when one of the external factors changes and there is noone qualified to update the software.

> What I'd like to ask the HN brain is if anyone can think of another way to express a higher level thought other than language? In his essay, Naur implies that there is no such thing. I wonder if we had made any progress on that front in the 35 years that have elapsed since this essay was written.

I think it could be argued that category theory, and categorical thinking more generally are basically in this spirit. There’s a reason why a lot of folks think it’s the best thing since sliced bread.

The basic idea is that it has a sharply crystallized notion of what it means to have an analogy, which can piggy back on top of a bunch of essential structures from math to provide a language that is very effective for communication. Of course it’s only effective in communicating with people who share enough of that context.

As an example of this spirit of using rigorous reasoning to communicate better is the Haskell motto that the existence of “design patterns” imply a failure of the language for lack of expressiveness (more of a relative statement than absolute). If your language is any good, and your understanding of the pattern is sharp enough, then you should just be able to factorize it into a library. This lends to a programming style with highly modular, declarative and terse code.

Disclaimer: I’m not a Haskell expert by any means, so YMMV.

"His hair is a flat-top; his mouth frowns in near grimace. He strides to my seat, looks down and says in a Texas drawl, 'and the key is simply this: Abstractions. New and better abstractions. With them we can solve all our programming problems.'"

— Richard P. Gabriel, “Patterns of Software”

While perhaps not directly relevant to you questions, your comment made me think of a book I read a while ago called The Information (https://en.wikipedia.org/wiki/The_Information:_A_History,_a_...).

It's well worth reading and made me appreciate the importance of Information Theory (and computers)!

"Is this really the best we can do in 2020? How are other fields conveying complex abstract notions and ideas?"

Shameless plug: it's been on my mind for a while as well. I'm writing a book on this that has (I believe) several novel insights.

There's far too much to cover in an HN essay, but perhaps my best response is that your idea of working backwards from machine code to thought is a mistaken paradigm. There's no translation or mapping, at least not in the sense that coders like to think. That's one of many fallacies we've ran into on our journey over the last several decades to write better and more useful code.

More generally, there is a similar issue with deriving "meaning" from human actions. The question "what did he mean by that?"
Personally I think the problem starts at “shared understanding” and I think the most promising solution lies within combining ontologies (such as ones based on BFO, perhaps) with Abstract Syntax Trees or Concrete Syntax Trees.

I suspect future development will occur merging tests as examples of program functionality with data models and call graphs derived and annotated based on common ontologies.

We would have, in this future, the ability to “translate” programs from one language to another the way Google Translate does, not necessarily as correctly as if one understands the language’s native idioms, but as if one had a dictionary of words and phrases and their definitions and could translate snippets to relate an unfamiliar codebase to patterns.

This would be clearer if tagging of code to an ontology were baked into the language the way the type system is baked into TypeScript’s ability to annotate types.

And TypeScript itself is an excellent example of how if we can annotate more fine-grained information useful generally only to programmers actively developing the program, it’s still very much a win-win.

I find myself frustrated now at how I can’t always rename string to other custom types if I want the string’s type to express meaning, similarly not every language supports string literal types.

Languages are more expressive about types than they used to be, and it’s possible to make types dynamically are compile time, so languages have themselves become more flexible.

The ultimate goal would be to encode the system model so concisely that you would want to re-use the model or ontology in a number of systems, yet maintain a bidirectional relationship so if your database or a third-party system adds a constraint to the model, the model reflects that automatically. Vice versa if your model incorrectly encodes the real world, you should be able to refactor your programs by changing the model.

I suppose to make this ontology-based solution a bit easier it should be broken down into two parts: an ontology of computer software and hardware terms based on ISO BFO as an example, and a separate ontology representing the program’s problem domain, often outside of computer science.

There is something of a flaw in this logic — models rarely map exactly to the real-world and thus while you can annotate or tag software, nothing can save you from a bad model or one that needs to evolve.

To that end, ASTs and CSTs with automated code formatting can help again. There are programs that can mutate tests until they pass to automatically suggest fixes. Programs you can write to rewrite programs automatically.

I actually think one part of the article aged poorly — the section where program modifications are hard to do at scale. Actually, program changes can be trivial at scale these days, assuming you can avoid PR merge conflicts of course.

The tough part is ensuring you’ve enough knowledge of what the program is currently doing, it’s current behaviours and environment, as well as what it was meant to do.

One last thing, a program may entirely be theory not code, but as a counterpoint: any behaviours undefined by the program model or spec will eventually be relied upon by somebody at scale.

Which is another way of saying that sometimes a program dies because it is adopted too widely and thus can never evolve without confusing everyone and everything that uses it.

This is perhaps an argument that programs should constantly evolve and be built for evolution, that models should also. If so, git and GitHub help tremendously but we don’t have enough similar tools for modelling and ontology yet. We don’t have a standardized git or TS for adding model annotations to source code or trees/derived program artifacts. Git commit comments help but only a bit, they aren’t descriptive enough. Can we relate a commit to a model change? Or a production incident? How interlinked yet machine interpretable are our models and corresponding representations in code, in commit history?

And finally, can we make models and ontologies easy to use and update with less training and distraction? Could a system be built to help reverse engineer models from code by illustrating possible shapes and a human then does the work of researching the correct details and aligning all possible representations into one derived model? I’m thinking of how human-computer systems generally outpace human or computer decision making alone. If so, we rely far too much on humans to understand models encoded in code today, and should shift that burden back to the machine as much as possible to instead assist us and where possible, spot mistakes in our models.

I fully agree about the role of an expressive test suite. One of the key insights of Kuhn as cited here is that a "theory" of gravitation requires examples like planetary motion and pendula. A UML Activity diagram can help to convey the application domain globally, but a good suite of unit tests helps me understand the micro-domain of a code base.
Have you looked into Flow Based Programming by J P Morrison?
As a life-long programmer, I've also wondered often about how we express human thought and ideas in(to) software. The process is lossy, as you put it, in that so much is lost in translation.

On the other hand, it's also "enriching" in that software can (or must) define the concrete details of models and logic that were missing in the source (human thought).

> ..another way to express a higher level thought other than language?

If software development is the modelling ("theory building") of higher-level human thought (often ambiguous or ill-defined) into textual source code in a programming language..

The answer may come from the "language environment", richer features in an IDE/editor that integrate with the compiler and the abstract syntax tree. When you mentioned "projections", I thought about how text is one of many possible perspectives into a program. I can imagine other representations, perhaps UML-like, that an editor could provide as a view mode - that would allow exploring the models and logic flow of a program visually.

Another aspect I think of, is that programming is a collaborative process that almost always involves people with domain-specific knowledge outside of programming. It could be that the biggest information loss occurs not in the encoding/programming, but in the (cross-cultural) communication between these spheres of thought.

..Which seems to imply that there ought to be developed a shared language - or conceptual framework - between experts in software development and those outside of it.

A shared language/framework would allow the encoding to happen at a higher-level of abstraction, collaboratively - instead of one side encoding in (vague) human language, then the other side encoding in (precise) programming language.

My line of thought keeps returning to something like interactive UML diagrams that can be developed together by all stake-holders. Ideally, these would be living diagrams that are directly used by the software itself, to generate database schema, internal models/classes, control flow.. But I'm skeptical whether "visual programming" as a paradigm is the answer, mainly because there have been countless unsatisfying attempts.

I find inspiration in the works of people like Seymour Papert ¹, Alan Kay ², Bret Victor ³, who challenge our existing notions of what programming is and could be.

A common thread among them is the focus on interface and environment, how textual representation of software is only one of many possible perspectives, that there's room for innovation in how we interact, explore, develop and communicate about software. A vital part of that is including "non-programmers", the rest of the world, in the development process.

I'm fond of the Whole Code Catalogue ⁴, and what the Future of Coding initiative is doing. I think this area of investigation is an important one, to continue to attempt to bridge the gap between human thought and software.

¹ http://papert.org/

² http://www.vpri.org/work/our_work.htm

³ http://worrydream.com/#!2/LadderOfAbstraction

https://futureofcoding.org/catalog/

geometry and visual proofs