Hacker News new | ask | show | jobs
by ajross 2927 days ago
Building software, serving web pages, executing database queries, running a DOM layout, managing game logic... I mean, come on. You knew what I meant. Those are all tasks with "medium" cache residency and "occasional" stalls on DRAM. Anything that does a bunch of different things with a big-ish world of data.

Conversely: finding a task that is L1-cache-bound but does not frequently have to stall for memory is much harder. The only ones off the top of my head are streaming tasks like software video decode.

1 comments

Oh, you meant typical for you.

One task that is L1 cache bound and does not frequently stall for memory (if you code it up well) is matrix multiply.

> Oh, you meant typical for you.

I'm pretty sure those are meant to be, and I think are, "typical" for the general purpose CPU in use, and thus the general case.

Both mobile and desktop CPUs will be doing DOM layout, DB queries (whether to SQLite or the registry or just the filesystem), and possibly computing game logic on a regular basis.

It's becoming popular to want to push machine learning tasks onto edge devices like mobile and desktop CPUs, for example apps that include some machine learning. Some of these machine learning algorithms do a lot of matrix multiplies.

"Typical" is highly varied, and it changes.

Edit: here's an example: Google brings on-device machine learning to mobile with TensorFlow Lite

https://thenextweb.com/artificial-intelligence/2017/11/15/go...

Would they be using mostly CPU for that, or would they offload it to the GPU or a dedicated chip? I would assume you would use your general purpose CPU only if all else wasn't available (and generally there's a GPU available on most end user devices these days).
If possible the GPU, but not all GPUs have either a library or enough documentation to write one. I’ve seen complaints about this issue on mobile GPUs for years, no idea how widespread it is now.

BTW, this is just one example algorithm that I picked because it does (on the cpu) what the person I replied to said was rare.

Running the model is much easier than training it. On power-constrained environments, DSPs can do it.