| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cglan 536 days ago

I’ve thought of something like this for a while, I’m very interested in where this goes.

A highly async actor model is something I’ve wanted to explore, and combined with a highly multi core architecture but clocked very very low, it seems like it could be power efficient too.

I was considering using go + channels for this

3 comments

jerf 536 days ago

The idea has kicked around in hardware for a number of years, such as: https://www.greenarraychips.com/home/about/index.php

I think the problem isn't that it's a "bad idea" in some intrinsic sense, but that you really have to have a problem that it fits like a glove. By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.

Contra the occasional "Everyone Else Is Stupid And We Just Need To Get Off Of von Neumann Architectures To Reach Nirvana" post, CPUs are shaped the way they are for a reason; being able to bring very highly concentrated power to bear on a specific problem is very flexible, especially when you can move the focus around very quickly as a CPU can. (Not instantaneously, but quickly, and this switching penalty is something that can be engineered around.) A lot of the rest of the problem space has been eaten by GPUs. This sort of "lots of low powered computers networked together" still fits in between them somewhat, but there's not a lot of space left anymore. They can communicate better in some ways than GPU cores can communicate with each other, but that is also a problem that can be engineered around.

If you squint really hard, it's possible that computers are sort of wandering in this direction, though. Being low power means it's also low-heat. Putting "efficiency cores" on to CPU dies is sort of, kind of starting down a road that could end up at the greenarray idea. Still, it's hard to imagine what even all of the Windows OS would do with 128 efficiency cores. Maybe if someone comes up with a brilliant innovation on current AI architectures that requires some sort of additional cross-talk between the neural layers that simply requires this sort of architecture to work you could see this pop up... which I suppose brings us back around to the original idea. But it's hard to imagine what that architecture could be, where the communication is vital on a nanosecond-by-nanosecond level and can't just be a separate phase of processing a neural net.

link

openquery 536 days ago

> By the nature of the math, if you can only use 4 of your 128 cores 50% of the time, your performance just tanks no matter how fast you're going the other 50% of the time.

I'm not sure I understand this point. If you're using a work-stealing threadpool servicing tasks in your actor model there's no reason you shouldn't get ~100% CPU utilisation provided you are driving the input hard enough (i.e. sampling often from your inputs).

link

jerf 535 days ago

To work steal, you must have work to steal. If you always have work to steal, you have a CPU problem, not a CPU fabric problem. CPU fabrics are good for when you have some sort of task that is sort of parallel, but also somehow requires a lot of cross-talk between the tasks, preferably of a very regular and predictable nature, e.g., not randomly blasting messages of very irregular sizes like one might see in a web-based system, but a very regular "I'm going to need exactly 16KB per frame from each of my surrounding 4 CPUs every 25ms". You would think of using a GPU on a modern computer because you can use all the little CPUs in a GPU, but the GPU won't do well because those GPU CPUs can't communicate like that. GPUs obtain their power by forbidding communication within cells except through very stereotyped patterns.

If you have all that, and you have it all the time, you can win on these fabrics.

The problem is, this doesn't describe very many problems. There's a lot of problems that may sort of look like this, but have steps where the problem has to be unpacked and dispatched, or the information has to be rejoined, or just in general there's other parts of the process that are limited to a single CPU somehow, and then Amdahl's Law murders your performance advantage over conventional CPUs. If you can't keep these things firing on all cylinders basically all the time, you very quickly end up back in a regime where conventional CPUs are more appropriate. It's really hard to feed a hundred threads of anything in a rigidly consistent way, whereas "tasks more or less randomly pile up and we dispatch our CPUs to those tasks with a scheduler" is fairly easy, and very useful.

link

openquery 536 days ago

Give it a shot. It isn't much code.

If you want to look at more serious work the Spiking Neural Net community has made models which actually work and are power efficient.

link

grupthink 536 days ago

I've implemented something similar to this using Golang Channels during Covid lockdowns. You don't get an emergence of intelligence from throwing random spaghetti at a wall.

Ask me how I know.

link