Hacker News new | ask | show | jobs
by navaati 1016 days ago
> There is an obvious solution to that but

I'm feeling a bit dense today and it's not obvious to me. Please do tell what that solution is and why it doesn't generalize :).

1 comments

Sorry, I should have added a link:

https://en.wikipedia.org/wiki/Dataflow_architecture

And the reason why it doesn't generalize: it is typically implemented in special purpose hardware.

I have twice had the experience of meeting a professor who had done their PhD in the late 1980s/early 1990s on Dataflow and were lamenting having wasted their time since it turned out to be a dead end. I pointed out to them that the computer they are currently using has a hidden Dataflow machine inside of it - it just has a front end converting the x86 code stored in memory into the dataflow graph (reorder buffers, reservation stations and so on) before actually executing it.

Perhaps one day the "all advanced processors are really RISC inside" meme will get replaced with "all advanced processors are really dataflow inside".

Of course, there is the option to make the dataflow visible to the outside with the EDGE (explicit data graph execution) architecture (Microsoft even showed Windows running on one).

Hello Jecel, your work was just what I had in mind when writing about this. How are you doing? Any major progress? I'd really love to see you succeed, I think what you were working on is one of the most interesting developments in computing.
We are still moving forward and hope to have interesting things to show soon. For any other readers, my Morphle Engine (aka SiliconSqueak 5) processor design has many elements in common with EDGE (and so with Dataflow as well).

I found the reference to Windows running on Microsoft's E2 processor (which in turn has links to relevant papers):

https://www.theregister.com/2018/06/18/microsoft_e2_edge_win...

I should also have mentioned The Mill architecture as an example of dealing with code as blocks of instructions instead of each one separately. In this case they get a higher fetch rate for their VLIW machine by starting execution from the middle of the block and fetching both up and down at the same time. So they have two program counters which get set to the same value on every jump or call.

https://millcomputing.com/

The point is that inside every advanced processor instructions are handled in groups and if you expose that to the compiler you can save transistors and energy.

GPUs are pretty close to an instantiation of a dataflow architecture. I have a soft spot for vliw on deterministic memory latency but the runtime scheduler + variable memory latency is probably better.

The insight is that GPUs are memory driven - warps basically wait for memory operations to complete while other warps do stuff - so execution proceeds driven by memory access patterns.

Not quite, but yes, they are converging. But GPUs are not universal enough to be called dataflow computers. But there are some similarities. But unless I'm mistaken SIMD is fundamentally different compared to dataflow in that it executes all of the instructions in lockstep across a bunch of units rather than that each execution unit can have its own stream of opcodes. So GPUs have great performance but require that tasks be more inherently parallel than what you could (theoretically, mostly) do with a true dataflow computer.