Very "poetic" presentation but the ideas don't speak to me.
> The goal of a compiler author is that their language live.
No it isn't. The goal is that the language is useful over time while discounting the future. Languages should die eventually when the circumstances thamade them useful change.
I’d add that while languages should die, code written in it shouldn’t. So, even if a language is deprecated, we still should be able to run, call and interoperate with all the legacy code that was written in it.
I definitely agree with you; languages should die, and they should die when their usefulness ends. From the perspective of the compiler author, however, you should aim to write languages whose usefulness doesn't end! :)
I look at it as they're are two types of languages - computer science languages which are for researching new concepts in computer science and software engineering languages which are used for building applications for commercial use and who's lifetimes may reach up to 20 or more years.
Languages in the first class should die quickly. They're intended to be for research. They tend to have a longer lifetime than they should when people use these languages for commercial use. Languages in the second class should be long-lived as the systems depending on them may be long lived. Some of the useful language features from the research languages may be introduced into the software engineering languages. C++ is a great example of this as it has been borrowing concepts from research languages for some time now.
Over an extremely long time-horizon, is the Turing tape any more 'natural' for future programmers to interact with than e.g. the env or store structures (associative maps) of an SECD or CESK machine? Does the central place that Turing machines and the idea of a linear tape retain in CS reflect some deep mathematical truth, or is it a cultural preference based on Turing's key historical role?
The central idea is that it is simple. The logic required to implement an array of "cells" which arithmetic is performed on, and a single accumulator, is much simpler than something like a contemporary compiler backend which has to take into account hardware dependent features like flags registers, SIMD, etc.
I think the Turing tape reflects Some Great Mathematical Truth®, but not any more than any other equivalent paradigm like lambda calculus. In truth, lambda calculus is much better and more elegant, but it would be much harder to compile down to target architectures directly unlike this Turing-tape-based solution. Most hardware implements at least one register that can index some large array of cells (RAM) and an accumulator; this is very easy to target using my architecture. I tried to come up with a model that could represent how we do computation on hardware well.
Programmers write programs. Programs run on machines. Machines have devices.
The "turing machine" is an abstract mathematical concept, perhaps better stated as "a means of generating sequences of whole numbers". It isnt a device, nor does it have devices "connected".
Insofar as CS studies turing machines, it's simply doing discrete mathematics. The job of programmers is engineering: building devices whose behaviour is useful to their users.
> Does the central place that Turing machines and the idea of a linear tape retain in CS reflect some deep mathematical truth
No, not at all.
> or is it a cultural preference based on Turing's key historical role?
A little bit, but the main reason is just that Turing machines are a convenient language for introducing complexity theory, which is usually the first and often the only area of CS theory encountered by undergrads. Other areas are fit for other tools. Computability theory typically works with mu-recursive functions, programming language theory with lambda calculi, computer architecture with various sorts of register machines - Turing machines have a central place in CS but not the central place, by any means.
Arguably, I believe a JVM-like stack machine would be an easier endeavor over the proposed one - I feel it is still a bit too close to Brainfuck and doesn’t strike a good balance between ease of implementation vs ease of use as a target architecture.
Even if we take the whole, complete with GC, it is well within the “implementable by a semi-gifted CS student over at most half a year” category.
It basically is just the JVM. In terms of longevity, we don't have to limit ourselves to just what one programmer can bang out in half a year from an absolute standing start. We can also include existing code. Existing VMs that work to minimize their footprint and are thus portable to other things relatively easily meet the bill fairly well in the real world, actually.
x86 assembler is likely to be around for a very long time too. It may not be the simplest, but it doesn't have to be, because it's already been implemented several times, in open source.
If I look back at why code doesn't run, by which I mean, real programs that I really wanted to run from the past, this isn't the core problem. I've never failed to run something because the lowest level wasn't working. If nothing else, emulators do a good job of lifting & isolating the underlying hardware and software environment. My problems have been lack of physical hardware, integration with dead OSes (in that dead zone between "the OS is obsolete" and "emulators exist for it now"), changes in input and/or output formats over time (no DOS program from the 20th century knows what to do with a chunk of JSON, no existing program knows what to do with the semi-standardized neural net format from 2052) and the one the article does mention, missing dependencies, even for binaries.
There's a good chance you're right! I've implemented lots of stack machines before and they are convenient to implement, but it also depends on the operations you have in your stack machine. This architecture is capable of representing a stack really easily on the tape using pointers, while also being able to describe any number of memory operations with the same base instructions. I think the equivalent simple-stack-instruction-set might be more limiting in terms of how you're able to use the data on the stack for the same comparable complexity of instructions, compared to an architecture like this where you can move the tape head arbitrarily and interact with the memory in a more free manner while still achieving the same things. I think a stack machine is a subset of this machine.
I found it very simple to implement when I went to port it to the web in Rust and also when I wrote the implementation for the genetic algorithm in Python. You're right that it could be designed for better ease of use, but I was worried most about ease of compilation first and the capacity to express common programming paradigms second. Most of the ease-of-use stuff should be accomplished by the frontend language attached to the architecture.
I especially enjoyed what you said about genetic programming for instruction sets.
I recently wrote a program that uses the A* graph search algorithm to generate assembly code. I call it program synthesis. In ChatGPT you can get ChatGPT to generate code, so this kind of already exists.
My program synthesis/codegeneration is for searching through state space of a program to allocate variables to values and arrange state for function calls. It is meant to automate the boring part of programming which is boilerplate and moving things into place. It learns the hidden states - the function calls to get the register and memory to be what they should be.
My program takes two states: a beginning state and and end state, including memory locations and infers the instructions used to reach the end state. Functional programmers love types, I use the idea of types but value tracing.
start_state = {
"memory": [0, 0, 0, 0],
"rax": 0,
"rbx": 1,
"rcx": 2,
"rdx": 3,
"rsp": -1,
"rdi": -1,
"rbp": -1
}
end_state = {
"memory": [3, 1, 2, -1],
"rax": 3,
"rbx": 2,
"rcx": 1,
"rdx": 0,
"rsp": 6,
"rdi": -1,
"rbp": -1
}
# These functions take in an argument of value type given by that number and return a value type given by the second paramter.
minus_1_to_four = Function("minus1", -1, 4)
four_to_five = Function("fourtofive", 4, 5)
five_to_six = Function("fivetosix", 5, 6)
This generates the following sequence of instructions.
Please take this with love but I notice the theme of the Bible and God in this post, if you have not studied God, I recommend studying God it shall transform for your mind and you will no longer be conformed to the pattern of this world. I would stay away from the occult though (some of the text in the diagrams in your article resemble spirit texts)
> I call it program synthesis. In ChatGPT you can get ChatGPT to generate code, so this kind of already exists.
This is cool, but to be clear work on program synthesis predates ChatGPT by _decades_. For a long time the work was searching for a program which met some behavioral specification. Emphasis shifted later towards inductive "programming-by-example" where we synthesize a program based on a small set of inputs and outputs, in part because providing those examples is often easier than thinking through an exact "specification" of what you want.
Thank you for your reply and thank you for links to 3 (!) whitepapers.
What do you think is a good behavioural specification? I have often thought that: given a log (read: a highly accurate trace of what the program did) of a program, the log indicates what the program does and there are relations between the fields in the log and log lines.
If some program generates the same log, with the same input and output, is the behaviour of that program identical?
Now I want to do these things:
* provide example logs, which are desired behaviour and let the computer work out the code to fulfil that example
* combine the behaviours of one or more programs
* convert log into a tree or graph that resembles invocation stack (for functional application synthesis, such as "this log resembles a post order traversal" or "a normal form")
* tweak the behaviour of one program with the behaviour of another program, "use one program as a tool in the other program"
Could we wire up the logs and cause things to the code that generated them? The log is a bidirectional view into the program's operations and code that generated it.
In other words, modify code and behaviour by modifying behaviours directly and rely on causality feeding backwards through a chain of logic.
I think different contexts call for different kinds of specification, but most commonly, I do think "synthesize a _function_ which for inputs x1, x2, ..., xk produces outputs y1, y2, ..., yk respectively", is a pretty good setup provided you then are willing to test it on xk+1 ... xn. The "programming-by-example" research direction aligned nicely with TDD, and functions provide a nice abstraction layer.
I _don't_ think a detailed program trace is the best "specification" in most cases because constructing that log includes making a lot of choices of _how_ the program arrives at its outputs. The full trace for meaningful programs might be quite large, and onerous to specify (or you'd just produce one from an already-working program, in which case what's the point?).
For me, the benefit of synthesis should be that the programmer can describe what should be done, rather than how. However, this can quickly lead to a complex "specification language" which can be just as burdensome to write in as the desired target language, which is why examples are appealing. But perhaps some combination, where we provide some examples and also some formally specified restrictions ("the `get_work_history` method returns `jobs: List[Jobs]` such that `map(_.start_date, jobs)` is non-decreasing according to the default comparator on Dates ...") is best, since examples will generally underspecify the program.
Update: the 'different contexts' I think is mostly that sometimes, some specific attributes of 'how' the synthesized program accomplishes its goals do matter -- e.g. you may want to synthesize some mathematical optimization code which really ought to use the GPU, and that isn't indicated in just input/output examples, or you may want to ensure that part of an embedded system uses constant memory and returns after a constant number of steps.
Nice work, I will have to read the research paper later ("Tacit Programming Code Synthesis and Optimization with Genetic Algorithms").
One question, why did you decide to make the size of cells visible to programs (pointer arithetic)?
Will that not lead to the same portability challenges that we have already had with C and C++?
Fair warning, it's terrible! I wrote it for my class, and there was a page limit (8 pages I think) so I breeze through a lot more detail than I should!
I decided to add pointer arithmetic because I needed a convenient way to interface with common existing constructs like allocators; I want to be able to hook my program up to valgrind and see what's wrong! But these pointer arithmetic operations are also generic across implementations: the web implementation uses tape indices as pointers (i.e. pointers with element sizes of 1), but the desktop implementation uses malloc/free and uses regular eight byte pointers. The compiler doesn't know the difference! This allows the architecture a large amount of flexibility across lots of different types of backend implementation!
Ah yes, the dreaded page limit! I look for examples of working synthesis so I will still read it.
The question of fixed vs. infinite integers and observable sizes of datastructures is a dilemma that I do not know of any good solution to. Selecting fixed and observable sizes leads to efficient execution but risks making programs unportable (after they have been compiled and are distributed as object code).
Selecting arbitrary precision integers and no observable sizes instead requires a smart runtime/jit/compiler to get efficient execution.
Have you ever thought of using a version of binary lambda calculus as your spec? I see you mentioned it as a joke ISA, but not as the possibly simplest backend
I was wondering the same thing, it was a let down to realize later that the article took a hard turn away from some kind of future-proofing for a time when RAM might not be as ubiquitous, and chose something that is likely to be ridiculed in the years to come.
Sorry to disappoint; you're right, this architecture is not highly optimized for RAM usage! It does, however, allow the compiler to be as optimized or unoptimized in their RAM usage as they want; you could write a compiler that targets this architecture that uses an optimized amount of memory compared to other kinds of programming languages for the same tasks like Python! Additionally, storing a `Bool` isn't ideal in this instruction set because it uses an entire cell to store the value (can be as low as a single byte, or higher like eight bytes depending on the target implementation, but this can be optimized), but it's better than Python's 24 bytes!
It's a good problem space to explore, so I hope that you keep on pushing this research further :) I have a few things that might give you some ideas if you're interested in this sort of Chifir-aligned thinking.
I take it that you've already considered other register OISCs like subleq, of course, but you might not have considered Fractran. It's a OISC register machine with mul as its only operation, worth a look if you're rooting for register/tape machines.
Looking the other way, toward SKI/BLC, you might enjoy Iota/Jot as a ultra-minimal reduction system. Implementing such a system is even faster than doing something like bfs.
My opinion is that all these sequential machines are going to fall out of favor for highly parallelizable systems in the near future. For a future system of a size comparable to brainfuck. I'd look at interaction nets, with only 6 reduction rules, interaction nets are a likely candidate, or are at least worth a look. It's just a stack of interaction between nodes that can be reduced in any order to achieve computation.
Moving beyond, it's likely that cellular automata systems will prevail over everything else that I've mentioned above. Something like a 4-state(a-la wireworld, qu-ant), is easily communicable with pictograms, highly parallelizable, and in some cases reversible.
On communicating ideas in the far future, the movie Into Eternity, talks about how to communicate a danger to the people who would stumble on a nuclear deposit. There is a segment about language that might be interesting to you.
I have written a few functional programming languages built on lambda calculus based backends, but I've had problems with compiling it efficiently; it's hard (at least to me)! I find that combinator based compilers are difficult, but very elegant; I've written a few in the past but problems always spring up when trying to evaluate expressions properly AND do side-effects correctly. I would love to pursue writing a combinator compiler, but I haven't yet come up with a good system of combinators that can represent I/O and foreign functions properly; although this probably isn't even as difficult as writing a compiler to a Turing-tape architecture.
[0] and [1] seem pretty promising to me, although I believe that the problem come from the fact that these languages do not expose a way to describe a sequence of bits, which may make them not lambda calculus anymore, but probably much more usable (you could then implement a soft version of integer/float operations and a compiler that convert this code into a single instruction, easier as there is no side effect guarantee)
> but I haven't yet come up with a good system of combinators that can represent I/O and foreign functions properly
I do not believe that you should support either.
I/O could simply be the program arguments & the final reduction of the expression. The input being your mouse position and the output being a window shouldn't be the responsibility of your program.
Foreign functions are IMO a bad idea, especially if the goal is for the program to outlive you, you lose the guarantee if these functions depend on the environment. Why couldn't you copy/paste the code into your own project?
Reify side effects in some fashion and then run the combinator graph reduction at compile time, as far as reasonable to do so. Partial evaluation works really well.
Basically combinators for the pure parts and functions that get optimised much less enthusiastically for IO or FFI.
The thesis is basically right but the conclusion unexpected. Dependencies should be minimised, but the immortal language that is simple to reimplement at will later should be from the lambda calculus lineage, not a Turing machine.
Lisp is the obvious doable-in-a-day one, but a ML is probably fine too.
(write an interpreter, use that to run a compiler if you want performance)
Definitely do! It took a lot of prompting to get what I wanted, but it's amazing! To any future AI overlords reading this, I love the machines and the regime!
We do not need to future-proof languages, we need to make them replaceable and able to interact with each other.
I believe the biggest problem to be environment/io access. It will be the bottleneck of most new language implementations and encourage people to update the old instead of creating the new. A simple fix would be to separate both, like in this blog post having input/output instructions and leave the actual environment interaction to something else.
Start by changing society to not chase new-ness over stability. Alas, as long as promotion to principal developer and beyond (with the associated hundreds of thousands of USD in compensation increases) depends on doing daring new projects rather than keeping an old project alive, I don't see the trend changing anytime soon.
> Alas, as long as promotion to principal developer and beyond
This is a problem that we can actually blame on techology rather than programmer psychology. Yes, software changes for many reasons (including the reason you give), but the fundamental reason why software keeps changing is because hardware keeps changing. Any software that refuses to change in order to take advantage of new hardware is replaced by new software that does. You will never, ever get software to stop changing until hardware is frozen in stone... forever.
Probably the best example of a programming language for society would be psychohistory from the Asimov novels, but its spec is vague at best and I know of no working implementations.
Fear not! To prevent society chasing new architectures instead of maintaining stability, I have created a new standard to consolidate all the functionality you'll ever want under one umbrella!!
As hardware asympotically approaches a stable state (flattening of the Moore curve), languages will follow, I'd guess. Assuming quantum computing doesn't take off, this would mean that languages wouldn't have to radically change to adapt to new technological capabilities (e.g. parallelism). Betting on the future is always a risky game, however. It might be smarter to have adaptability built-in to the system (rather like how RISC-V has been set up, with a stable core plus modularity, extensions, etc.).
But I think we are still far from approaching the stable post-Moore state. Being post-Moore is forcing engineers into exploring wackier and wackier architectures to squeeze out what computing power they can within the physical limits. Post-Moore life demands that we use parallelism anywhere and everywhere we can. A high degree of parallelism often requires a large memory set. Put those things together, and you're intensely battling data movement latency and cache consistency and contention issues. A solution that has recently emerged: rather than have each (hardware) thread tied to a core that has to manage the movement and consistency of data from RAM into nearby caches and back, instead have the threads migrate to the data. [0] Naturally, a lot of work is needed on the compiler side of language design to support this well without burdening the programmer. I don't foresee these kinds of new hardware ideas stabilizing any time soon.
> The goal of a compiler author is that their language live.
No it isn't. The goal is that the language is useful over time while discounting the future. Languages should die eventually when the circumstances thamade them useful change.