Hacker News new | ask | show | jobs
by thesz 1493 days ago
There are some advantages.

For example, in Alpha AXP they measured that 60% of energy spent in device is due to clock propagation. No clocks to tick - no energy spent. Why do we need to even clock FPU? Or bus - if we are in loop that is in cache.

Another example: in async design ripple-carry adder will exhibit O(log(N)) expected time, with worst case being O(N) and most of the time it will be even less O(log(L)) where L is number of bits that are non-zero. Basically, adding 1 will be as fast as, well, doing AND and XOR in parallel. For clocked design you need to make adder more complicated to make sure that worst case is O(log(N)).

The same is true for other parts as well - multiplier may not even need to wait for some values multiplied by zero bits. You may end up with O(log(N)) and even faster average case multiplier.

Your design does not need strict adherence to timing requirements: if you have seldomly used slow part, your chip still would work fast most of the time (in average). I know of one case where clock frequency of synchronous design had to be turned down because of problems in the placement of some, you guessed it, not frequently used part of a chip - a long bus line to some I/O controller that operated on main clock frequency. This means your asynchronous design can be more modular.

2 comments

No clocks to tick - no energy spent

If only it were that simple. Logic gates take time to settle, and each input gate switch or transient will have a ripple effect on all its downstream gates, which can be many in a complex circuit. Synchronous logic elements such as latches will block the spurious transients from propagating beyond the next clock barrier, but if you lack those, you also lose the protection against propagating logic transients. And every transient draws a little bit of power.

Imagine the ripple effects of a 64-bit 2-operand multiplier (simple ripple-carry, as it's the easiest to reason about). Since the inputs are probably not gated either, each of the 4096 adder tree inputs may arrive at a different time, and each input has an average of 96 downstream gates (64/2 adder tree height, 128/2 carry propagation length). The carry propagation is done through and-gates which have an attenuating effect on the propagation length (each input bit flip only has 50% percent change of propagating the change), but the xor-gates for the adder propagate every transient. On average, you still get 64 transients per adder input transient, and 2048 (64 and-gates * 50%) transients for every operand bit flip. That's a lot to account for in your worst-case power envelope.

Yes, asynchronous designs are more flexible to work with. But they are less predictable in operation, not just in propagation delay but also in power usage. And you still need some form of inter-module communication, and that communication needs to account for differences in signal path length -- which is much easier to do if you can refer to a global clock.

I'm sure there have been successful asynchronous designs for specific applications (e.g. analog feedback control loops), and I haven't kept up with the last ten years of IC development which is a lifetime, but most asynchronous logic designs weren't necessarily faster than their synchronous implementations last time I checked.

Contemporary intermodule designs are pipelined and message-oriented exactly because it is hard to predict difference in signal path length for long paths. I am talking about high speed buses from ARM, I think I read about them in 2016 or so.

The same can be done with asynchronous designs, in more relaxed way.

You said that asynchronous designs are less predictable in their use of power. Can you elaborate on that?

> The same can be done with asynchronous designs, in more relaxed way.

Sure, just ask these guys:

https://chronostech.com/technology

Chronos Link: A QDI Interconnect for Modern SoCs https://ieeexplore.ieee.org/document/9179196

It's compatible with TileLink, which is SiFive's Fabric. https://bar.eecs.berkeley.edu/projects/tilelink.html

Another advantage is higher yield due to higher tolerance to the production defects.
This implies yield loss is mostly due to small delay defects and not stuck-at faults. Are you sure this is the case?