Hacker News new | ask | show | jobs
by cogman10 681 days ago
So, what this project misses, which is quite hard to capture if you think of gates being just on off switches, is the fact that signals are not instantaneous, and everything runs in parallel.

As the AND gate 4 gates up the chain switches the NOT gate 4 gates down the chain starts to send different and unstable signals which may or may not be interpreted as a 1 or 0 in the downstream gate.

That's the reason computers have a clock, to make sure all transistors in a given stage of a CPU reach a steady state before moving on to the next instruction.

This is why it's probably a good idea to work with a HDL instead of just trying to wing it.

9 comments

The game Silicon Zeroes ( https://store.steampowered.com/app/684270/Silicon_Zeroes/ ) teaches this. It starts out with components computing their outputs instantaneously, but then introduces the concept of microticks such that the output of a component is unstable until its inputs are stable enough for some time, so the clock speed must be adjusted according to the largest delay in all circuit paths. The game starts off with simple circuits but very quickly becomes about making a CPU, although the ISA is hard-coded into the game and very small.

Another game Turing Complete ( https://store.steampowered.com/app/1444480/Turing_Complete/ , https://news.ycombinator.com/item?id=38925307 ) lets you build a CPU from basic gates and a much larger (and customizable) instruction set. It also has the concept of gate delays, thopugh it doesn't visually show the unstable output as Silicon Zeroes does.

Turing Complete is fantastic! It's a very much one of those games I'd say is like Kerbal Space Program, in the sense that you can be technical and have encountered all the concepts before, but bridging the gap where you actually grok what's happening intuitively isn't quite a leap you can make.
I found that Silicon Zeroes is a pretty crappy emulation of a Zachtronics game. IIRC each level has an intended solution and only gives you the exact parts to create that solution, even if a better solution would be possible with different parts you've used before.
While you're right that Silicon Zeroes is not the best as a zach-like puzzle game, I actually found it did a better job of teaching me about building a computer than most similar games.

However, I played it years before Turing Complete was released, and I'd probably recommend that over SZ. But SZ does a better job at introducing some of the timing concepts.

The earlier levels are like that, but the later levels have a lot more leeway.
Propagation delay, besides being physically unavoidable, is actually necessary for things like latches (and thus memory) to work, since they rely on feedback loops. The https://en.wikipedia.org/wiki/Ring_oscillator is another example.
The thing that got me into CS was building a CPU in Minecraft redstone, which is surprisingly good at being a logic simulator.

You only got (back in my day) NOT (the torch) and OR gates (wiring next to each other), and everything was built out of them. Signal repeaters had delay, the gates had delay, so you naturally had to build a clock to synchronise it all.

The main benefit that I enjoyed was that you could see the signals physically propagating from one end of your cpu to the other in real time (clock cycles were in the range of 1-4 seconds), flowing from the instructing fetching, decoding, dispatch, logic, writing to back memory, etc. Seeing signals slowly crawl from one end to the other naturally introduced you to pipelining (it even happens naturally if you increase the clock without thinking about what will actually happen: the next instruction starts decoding before the previous one is done, more parts of your cpu start lighting up at once, and oops now you have a pipelined cpu).

Even the scales match; many learners are surprised that the actual ALU is the tiny thing in the corner that you can barely see, and all the giant stacks of rows you actually saw is memory and cache. Even in minecraft, most of your CPU is not logic :)

Also really taught me how asics are much faster: you could build a tiny compact multiplier that multiplied hundreds of times faster than your giant cpu running a multiplication algorithm.

Looks like they community I learnt from is still around actually https://openredstone.org/, even if all the old forum posts seems to be gone now. There were some geniuses on that place building computers with multi-tier caches, full pipelining, out-of-order (albeit primitive) execution, SIMD-capable CPUs, all in redstone.

Shameless plug, I made this website which simulates Minecraft redstone as a hobby project many years ago:

https://mordritch.com/mc_rss/

Make sure to press the "Play" button at the top for the simulator to actually run.

The state of the art these days is https://github.com/MCHPR/MCHPRS

It's an implementation of the minecraft server that compiles redstone into a graph representation, and then does optimisations like constant folding, tree shaking, etc. Very cool project. If you ever see timelapses of minecraft computers running game of life, mandelbrot, etc it's most likely with this project.

This is awesome, thanks a lot!
> That's the reason computers have a clock, to make sure all transistors in a given stage of a CPU reach a steady state before moving on to the next instruction.

Here I was thinking[1][2] the reason computers had clocks was merely a consequence of the synchronous architectures that characterize them.

[1] https://en.wikipedia.org/wiki/Metastability_(electronics)

[2] https://en.wikipedia.org/wiki/Quasi-delay-insensitive_circui...

What point are you trying to make?

You are correct that clock free designs exist. But calling it a mere consequence of sync design seems to be a misunderstanding of why sync design has a clock in the first place.

In the case of synchronous design patterns that handle metastability, it was to point out that the clock clearly isn't there "to make sure all transistors in a given stage of a CPU reach a steady state". The clock is there to invoke state transition, whereas achieving steady state is a function of satisfying setup/hold times; the former is fundamentally constrained by the latter.

In the case of QDI circuits, it was to point out that there exists CPUs which do not contain clocks, again challenging the assertion that the reason computers have clocks is "to make sure all transistors in a given stage of a CPU reach a steady state".

The purpose of synchronous designs is to minimize the overhead of the handshaking that asynchronous designs bring with them. There is no other meaningful difference between them, really.
The netlist simulation of visual6502.org [1] works like this, it basically runs the simulation in a loop until the node network reaches a steady state, which means the current clock cycle is 'done'. This is a C port of that inner loop [2].

[1] http://visual6502.org/JSSim/index.html

[2] https://github.com/mist64/perfect6502/blob/268d16647c6b9cb0c...

>This is why it's probably a good idea to work with a HDL instead of just trying to wing it.

During my CE studies I got hooked heavily on FPGAs and HDLs, but the amount of times I actually ran into a timing problem, which only occured on hardware, because of these delays, can be counted on one hand. Playing around with this crap since about 2018. The worst one was where I got clock domain crossing wrong while trying to get DDR3 RAM working for my master thesis. It worked flawless in simulation. Took me weeks to find the wrong status signal

Yes, they can be a serious problem. Yes, you should know about it. But no, it's not necessary

I'm trying to think of how you could simulate the non-instantaneous/clocked aspect of it in code without using an HDL, so you could write a more accurate simulator in a familiar language. I'm thinking you'd want each gate to store its value on the current clock tick, then you'd have a RunTick() function or something which uses the previous values to calculate the new values on the current tick for everything, then stores those new values. I'm not sure how that'd work for an SR latch, a gated SR latch, or a D flip-flop, if you wanted to model those at a gate level; would you have to model those as primitives?

EDIT: After a bit of thinking and bouncing ideas off of ChatGPT, one approach might be to have an SR latch that gets initialized to some arbitrary valid output, which can then use that starting output to calculate future states. From there, you could build up more complicated elements.

Or you can build a computer on a breadboard: https://www.youtube.com/playlist?list=PLowKtXNTBypGqImE405J2...

I did this and not only was it great fun it taught me loads about clocks and busses etc. that was all magic before.

But there exists clockless architectures.
Is clockless useful in the face of modern ASIC tools? Not that you were suggesting in either direction, but it's been something in the back of my head for a while now.

Sure, there's a slight power and latency advantage in avoiding flip flops themselves but every tool has slack stealing and so the core advantage everyone seems to claim clockless (that is, the ability to have different latencies for all your different pipelines) isn't that unique anymore and hasn't been for a long time. Is it dead, can I stop worrying about the async demons :)

There's also things like a CMOS Z80 coupled with static ram. Where you still deal with propagation delay, but you can single step the clock and everything works like you expect. Mentioned, as it would fit pretty well in the code model the linked project implements.