Hacker News new | ask | show | jobs
by 3amOpsGuy 4727 days ago
Accessing databases, processing data from sockets, are not CPU bound activities? I believe you've misread my post.

For all your IO cases, and all your cases are IO, would you, and future maintainers of your code, not be better served with simpler abstractions which permit scaling past a single host?

2 comments

> Accessing databases, processing data from sockets, are not CPU bound activities?

I work on a product (network device) which involves all of these activities, and they are all memory latency bound. The overhead of task switching is far too high to recover any benefit from task-switching during memory stalls.

To top it off, the product performs a significant amount of computation, almost none of which fits a SIMT GPU model (i.e. there is a lot of branching).

The only performance solution for our product available from today's hardware is CPU parallelism.

I wasn't referring to the IO bound side of it but the general work involved with everyday generic work that was not something that a GPU can do very well. It's silly to say the answer to doing parallel is to throw it on the GPU.

But referring to the IO side debate, the current design of many of the libraries that you call in the C world are often inheirtly blocking. 'gethostname' for example is a blocking call. There is no async version of it. To use them without contention on your single threaded application, you have to call them from worker threads.

The common pattern is to spin up a thread to call it and do work on it. It's easier often to have your workers be thread bound like that to simplify your code and only lock shared resources when you need them. I can also make a massively async version of all my code that handles everything using async methods and in many cases this is better but it's harder to write and not always an option. Something I have to deal with daily because I run into the C10K problem all the time at work (http://en.wikipedia.org/wiki/C10k_problem).

Even in the async model though I still want to be running code in parallel and I would still rather build that model up with thread powering it and not multiple processes and shared memory.

A GPU, as you know, doesn't exist in isolation. It sits on a multicore host. The load of input data and the writeback of results does not occur from the GPU as I suspect you know. Maybe in future with unified memory this will be possible but not on current devices.

The actual computation, the bit that was previously multi threaded (or more commonly, multi process) on a CPU, now lives on a GPU. I'm not sure what's silly? The compute bound workload, is now done on the GPU. The IO workload is still done on the CPU, in an inherently single threaded fashion. Even when the multi process computation was done on the CPU, load and store operations were still single threaded. This stands to reason since there is no advantage in splitting 500 concurrent hosts connections into 500* CPU cores connections to hit a central data repository with...

I can't think of any code off the top of my head that calls gethostbyname repeatedly. Maybe a network server of some description which is doing reverse lookups to allow for logging purposes? Although that seems inefficient, I can't think of a real time use case for the host name when you're already in possession of the IP, I can only think of logging / reporting uses cases which would be better served doing the lookup after the fact / offline.

If that's a valid example of what you're suggesting, then would the existing threaded code not be more efficiently implemented asynchronously? There's a finite limit to the number of threads you can create and schedule for these blocking calls, at some point you will have to introduce an async tactic. At that point, why not drop the threading altogether?

You say you would rather build a model on top of threads. Why? Does it make your testing simpler? Does it reduce the time for new starts to get up to speed with your code? Does it reduce the SLOC count? Is it simpler to reason about?

I hope you would agree, in all these cases and many more, threading is at a significant disadvantage. I stand by the assertion that its dead(-ish).

The ish qualifier comes from another case we've not discussed, yet!