Hey HN! I'm Suraj, one of the authors of this blog post. Andy (@acarl005) and I came up with the idea to write this post when we realized there weren't many existing resources on how a terminal works under the hood, end-to-end. If folks find this useful, we'd be happy to turn this into a series and dive deeper into sub-topics. Let us know what you think :)
> ASCII text would be transmitted character-by-character over the wire as the user typed.
How does that work exactly on a lower level, say the current? ASCII text would be decoded to the binary and 1s would be high voltage and 0s would be low? And if there's no data transmitted it would be all low voltage?
On the old terminals, the wire sits at one voltage level until a character is transmitted. After that, the bits are spaced out at agreed-upon time intervals. So the receiver detects the first transition, and then reads and stores the bits by applying the same timing, then waits for the next character. Meanwhile, the value of that character, say an 8-bit number, is stored in a register, and typically the processor is interrupted. This gives the processor time to deal with each character before the next one is received.
Oddly enough I could have answered the OP's question in an interview, 40 years ago, but stuff has gotten so complex that I can't even tell you what all of the layers of abstraction are.
In the olden days, I'd drive my Fred Flintstone car to work. Serial I/O was done by wire-wrapping a UART chip on the board, which handled all the dirty details.
On my first PC, the serial port was an add-on board, and I wire-wrapped one to save a few bucks. My hands were trembling as I powered it up. I don't remember the chip either. Next time I "wired" a UART, it was built into a microcontroller. I also bit-banged one for the PIC16C84 MCU.
But asynchronous serial isn't completely obsolete yet. It just has more protocol layers on top of the physical layers.
It still works the same way, just abstracted as individual characters and not specifically the implementations of electrical representations of keys. Every character is stateless, so whatever's before or after doesn't matter. So this means that some operations are a lot easier and a lot harder than they otherwise might be, because while you only have to make a decision for one character at a time, you need to keep an internal state if you want to display characters or do something with them as a whole.
I binged CuriousMarc’s YouTube series on teletype restoration a few months ago, so here’s a simplified (and possibly somewhat inaccurate) explanation of how one works.
The only electrical components in a Teletype are (a) a continuously-running electric motor, and (b) an electromagnet and a few switches. Everything else is completely mechanical.
The Teletypes in a circuit, and the electromagnet and switches within them, are all connected in series, forming a current loop. (Current, rather than voltage, is used because you can use a constant-current power supply to get the same power at each electromagnet regardless of how many are in the circuit and how many miles of wire are between them.)
When the circuit is idle, current is flowing. This allows any Teletype to begin transmitting; if it were the other way around, each one would need its own individual line power supply. (Incidentally, this is the origins of the “break” key still found on many modern keyboards: when the circuit was disconnected, nobody could transmit, so “breaking in” to the circuit was how you interrupted somebody, and a lot of early computers used this as a primitive version of ^C.)
When you press a key, a rod is lifted, activating a clutch that connects the motor to the rest of the machine, which starts running. The first thing this does is break the circuit, deactivating the electromagnets in the receiving mechanisms of all the other Teletypes. When this electromagnet deactivates, it trips the clutch in those machines, starting them running so they can receive the character. (This is why it’s called the “start bit.”)
One-tenth of the way through the rotation, a cam in the sending machine switches from the “start bit” to the switch for the first data bit. If the rod on the key you pressed has a bump on it in the right spot, it will press this switch and send a “mark”; if it’s missing the bump you’ll get a “space” instead. On the receiving end, a clever mechanism connects the electromagnet to a lever: if the magnet engages, it pushes the lever one way; and if it doesn’t, the lever stays pushed the other way.
At two-tenths of the rotation, the first data bit switch is disconnected and the second data bit switch engages. This of course reads the second bump on the rod, and at the receiving end the clever mechanism has moved so the magnet pushes (or doesn’t) the second lever.
(Notable exceptions: the ‘ctrl’ key forces the first two bits low—this is why e.g. Ctrl+D and “End of Transmission” are the same—and ‘shift’ forces the first bit high.)
At three-tenths of the rotation, of course, the third bit is read, transmitted and received; this process continues for the remaining data bits.
The final two-tenths of the rotation are known as the “stop bits”, and consist of some signal that doesn’t matter much. The receiving end uses this time to trip the print mechanism: the levers that were pushed-or-not during the data bits engage with some bumps in some other rods, blocking all but one of them that matches the specific pattern, and then a thingie shoves forward, slamming the appropriate type bar up into the ribbon and paper, and as it returns the carriage advances one character. (The reason for using two bits is simply to give some time for all of this to happen.)
Finally, the mechanisms on the sending and receiving ends complete their full rotation, disengaging the clutch and coming to an abrupt halt; the transmitting end reconnects the line so the circuit is available, and everything is ready for the next character. All of this has happened in less than a tenth of a second, a mechanical ballet choreographed too fast for the human eye to see, and transmitting information potentially hundreds of miles at the speed of light.
Haha! The post was actually motivated by a common (interview) question: https://github.com/alex/what-happens-when. We thought it would make for a fun adaptation for the terminal world
Would you pass this test if instead you answered what would happen if you open a terminal in a made-up operating system and you essentially designed the whole thing in your answer?
I have asked this question in interviews before, with focus on low level OS aspects. It depends on whether your made-up OS includes the things we care about.
I usually ask the question in an open way, but guide the candidate to "fast forward" past the (for us) boring parts, and focus on the interesting parts we are looking for along the way.
If your OS covers the concepts we are looking for and the answers seem competent, sure, let's give it a try, but don't be surprised if we veer off of that. If the OS is too simple to cover everything (not unlikely for actually existing toy OS projects, I have one myself), we'd want to branch out to those concepts no matter whether you included them in your hypothetical OS or not. And if your made up OS is just too different, it might not apply either.
In other words, ask yourself why this is part of the interview, and that should answer your question: Because you are going to work on a particular OS with particular characteristics.
I mean, points for creativity; it just depends on this made-up answer. If you're relying on transistors made up of unicorn farts and gumdrops, and the interviewer has no appreciation for whimsy, that's not gonna go over well. OTOH, if they do, that's bonus points. All I'm saying is that it's a gamble.
I don't think rote memorization applies very much here. When I ask the question, we start out open, let the candidate drive, but ask lots of questions along the way, and at certain points get very detailed on aspects that we care about particularly (while skipping quickly over, for us, uninteresting ones).
If you "rote memorized" both the high level view and the raw details so well that you can freely think and talk about it during the interview, I'd say you'd have to have had some experience as well.
terminfo would be a good topic for a couple blogposts for sure. Back when I was first learning all this stuff I never felt like I truly understood all the intricacies of this system. I tried to understand back when alacritty made it's own entry, but a lot of things still feel somewhat arcane.
Yeah I like to think of it as a capability list as well. There was a brief footnote [12] about $TERM in the post, but terminfo could certainly be dissected further.
Until you hit something like "Get-ChildItem : A parameter cannot be found that matches parameter name 'lh'" and are jarringly reminded that this is not your old friend `ls` after all.
This is why, if I use a shorthand in PowerShell at all, I avoid using "dir", "ls", and the like. "gci" is at least not like any common Unix/BSD/PC-DOS/Windows/OS/2 command.
Now let us compile the source without the code at all.
A terminal should only do anything when the user types something and presses enter, and then it should only do what the user told it to do. The idea that it goes off to the network unasked is beyond invasive.
But good move. The idea of a terminal tracking my typings and commands is pretty scary stuff, especially if I'm going to use this thing for security sensitive sessions.
I just found out about warp, installed it and having a similar issue.
Terms are really weird for something I thought was a terminal app, and their commong questions are talking about "cloud-oriented features" which I really don't know and probably don't want.
I'd be happy to pay (even per month) for a version of this that asks you "do you want to use our cloud features or just the local version" and has different pricing and definetely no login.
Note: I didn't end up actually trying this as I really didn't like the sound of not knowing what my terminal is sending out. They do list what they send for telemetry, but not sure what is considered "cloud-features"
Thanks for the comment. I definitely feel your concerns.
To clarify, all cloud-oriented features are fully opt-in. For example, you have to explicitly share a block for us to store it in the cloud.
You can also opt-out of telemetry (which we use to determine feature usage and plan our roadmap FWIW). We even have a network log so you can see every network call we make. More details here: https://www.warp.dev/blog/telemetry-now-optional-in-warp
Another issue is that when an update is available I need to be able to tell Warp that I don’t care at the moment.
Due to the corporate software on my work Mac auto-updates fail. Ordinarily this isn’t a huge deal as I’ll just download the DMG and replace it manually. But Warp drops down an obtrusive overlay and you can’t dismiss it until you update.
I tried to like it, but little forced choices like that sent me back to iTerm 2.
Hey, sorry about that! We actually have a WIP fix for this exact issue (the obtrusive overlay that you can't dismiss). Should be out in the next week or two.
That article starts with too many inaccuracies to recommend it to anybody:
- Teletypes were designed about a century before the era of mainframes.
- Mechanical teletype did not have an "I/O driver" in it.
- OS on Mainframe computers did not have a "Kernel", and neither "I/O driver", "Line discipline" or "TTY driver". This model was introduced with UNIX, the OS that ran on minicomputers.
- Was ChatGPT used to write this article? ;)
I've recently stumbled upon an article on the same topic, but containing competent and accurate information, I have a link because I recommended it to a friend: https://thevaluable.dev/guide-terminal-shell-console/
Hi. I definitely took some liberties with terms here. For example, I said "I/O driver" instead of, say UART driver because I didn't want to get overly specific on the historical part. That wasn't the focus of this article. Thanks for linking this other one, as it fills in those parts well.
I’ve seen many times descriptions of how original teletypes work and how that influences modern terminals.
However, something that I have not seen yet and I’d love to check is an attempt to redesign the terminal from scratch separating historical baggage from the parts that still make sense nowadays.
How would a designer create this tool without the historical precedence? Is tradition holding us back? What new standards could we reach ?
I love this sentiment. Your questions would make for a very interesting post.
While we haven't rebuilt from absolute ground zero, Warp is definitely trying to extend the capabilities of a terminal (emulator) from what's historically been possible. For example, we introduced a dedicated input editor so you can have an IDE-like experience in the terminal. It's fundamentally different from how input is entered in a traditional terminal. But with this innovation, we've had to be careful to ensure that all the input features you expect in a normal terminal (even obscure ones like `alt-.`) work how you'd expect, _and then some_.
Overall though, starting from scratch is hard because we need to stay backwards-compatible with all the CLIs we use everyday.
That was already done in the 1980s and early 1990s. Microsoft/IBM operating systems evolved the notion of consoles, still-handle-based I/O devices that responded to additional first-class I/O system calls (beyond the read/write/ioctl model) for 2-D addressing, direct output buffer manipulation (including reading of the output buffer), keyboard input that did not hide the decoding of key chords into characters, and mouse input.
We like some of the baggage for two main reasons. 1) some designs were well thought out back when resources and performance really mattered, a lot, so they make for fast and efficient implementations enforced by spec. 2) perhaps more importantly, we like our old tools to continue working, plain and simple.