Hacker News new | ask | show | jobs
by Marcus10110 4222 days ago
This is a fundamental in FPGAs. The mode where the bit is unstable is called metastability. Most FPGA tools can automatically infer gray code for state machine states, and pretty much all cross clock domain FIFOs will use grey code to indicate where the read and write pointers are located.
2 comments

I remember someone (but not whom or where: maybe on HN or maybe a blog post) saying that all digital systems that accept and quantize analog input will have some analog input conditions with consequences that persist and extend arbitrarily far into the digital part of the system (so that theoretically you could cause an OS on a digital computer to crash just by pressing a key on the keyboard at the exact right time). Does anyone remember where this observation appeared, and is anyone's familiarity with metastability enough to clarify how accurate or inaccurate this description is?
It's theoretically possible but not practically so. Computers are actually analog, not digital. There's a lot of effort that goes into making sure that the analog signals behave enough like the ideal digital ones, but they're still analog.

So we've got very, very good at the engineering analysis to make sure that even though it's actually analog we can have a meaningful conversation about things and pretend it's not.

Pressing a key on the keyboard would never do that because there's a little microprocessor built in that actually handles getting keypresses and transmitting them to the real computer. Could you crash that computer by pressing a key at exactly the right time? No. Because they've designed it so that even in the metastable state it handles things correctly. A key defaults to off. If there is enough evidence over a sample period that a key was pressed then report on. If there's not enough evidence, it's off. Switch debouncing is a well established and practiced discipline. http://www.ganssle.com/debouncing.htm

In this context I'm wondering about the theoretical possibilities rather than the practical risk; I think the analysis that I'm referring to assumed the probability was negligible in practice (or can be made negligible in practice), so the question is whether it is still true that there is some analog input (even from a key on the keyboard) that in principle makes the machine misbehave even though no physically realistic process can typically actually generate that input.

The debouncing examples in the article you linked to don't seem to eliminate the risk in a theoretical sense. The RC circuit is taking an analog function of an analog function to apply some smoothing and significantly decrease the effects of transients on the observed voltage at the ADC, increasing the proportion of the time it will spend in a well-defined range if driven by a noisy switch. There must still be some set of analog inputs that would keep the voltage in an ambiguous range, though. The SR latch can itself experience metastability. (And the software solution should be right out, because it starts from the assumption that each individual digital measurement of the switch state has produced a well-defined binary value that can be safely used as input to expressions and functions at the software level.)

Wikipedia says that chaining latches together merely (dramatically) reduces the probability of this behavior, rather than actually eliminating it, because each latch could in principle (though with ever-decreasing probability) introduce and maintain metastability in the latch following it.

https://en.wikipedia.org/wiki/Flip-flop_%28electronics%29#Se...

Wikipedia cites to this article

http://ibm-1401.info/AnomalousSynchronizer_ChaneyMolnar_IEEE...

which seems to say that it was understood in the 1960s that every interface between digital circuits with no common clock (as well as every interface from an analog to a digital circuit) presented a theoretically "fundamentally inescapable" risk of introducing metastability which could propagate into the digital system, and that this was thought to be a source of some practical errors in computing systems in the early 1970s.

On the next page the author talks about ways to bulletproof this by avoiding latches and to use hysteresis to ensure that the debouncing is perfect. Not perfect in the sense that it always gets the state right -- it might register a button press when none is there if there was absolutely horrific EMI, or it might not register a button press that is too short -- but perfect in the sense that there are no undefined states. That's the point of hysteresis; you can't fake it out. If it's in the no-mans-land between decision points, it defaults to whatever it was previously.

You're right that any long chain of latches could end up metastable and screwed up. But this technique doesn't use latches, it uses multiple reads on an analog input which is interpreted digitally (but not an ADC) to preclude the possibility of any kind of undesired behavior.

The "reads on an analog input which is interpreted digitally" are effectively a one-bit ADC or comparator, aren't they?
Yes, but typically when you're talking about an ADC you're talking multiple bits. These days an 8 bit ADC is "low resolution" and 10-12 bits are nearly free (considering they're built into tons and tons of microcontrollers).

The point was more that it was a single bit, not a 10 bit ADC that's used to make determinations about switches. From a theoretical perspective the idea of a 1 bit ADC makes sense. From an engineering perspective, it doesn't. Since I'm an engineer that's why I said what I did.

There you go.[0] I believe it's called the 'arbiter problem' in that context, and your description seems basically correct.

[0]: http://research.microsoft.com/en-us/um/people/lamport/pubs/b...

Yes. This is a well known problem. See (https://en.wikipedia.org/wiki/Arbiter_[electronics]). It's not possible to build an arbiter which will reliably decide who wins in a fixed time. It is, however, possible to design one which can detect that the arbiter hasn't stabilized yet and delays until it has. All multiprocessor shared-memory systems need this.
Whoa, that paper is fascinating!

I'm trying to improve my intuition for why the physical exmaples in it are right. I found the examples of inevitable crashes and collisions disconcerting.

I think it's possible, but it becomes less and less likely with the depth of the latches.

Supposing your keyboard key input was latched by a flop and you pressed the key at the exact right moment you could violate the timings of the flop and have it end up in a metastable state.

That's why you usually put a bunch of delay flops when sampling such a signal in the hope that you'll manage to have a stable signal down the line, however that doesn't remove the problem altogether, it just makes the probability of the problem happening very low.

Huh, sounds kind of like probabilistic primality testing! (We can specify the probability of getting a false positive in the primality test and make it as low as we want, though it's still always formally possible to have false positives.)
Not only that, it is a standard within absolute encoders since years.