Hacker News new | ask | show | jobs
by dottrap 2934 days ago
I don't claim these are "go-to" solutions, but only that there are multiple solutions to pick from.

One solution is processes (mentioned in the post). Fork a process which does your computationally expensive thing and then get the result when you are done. For the security minded, we've seen this make a bit of a come back because separate processes can be run with more restrictions and can crash without corrupting the caller. We see this in things like Chrome where the browser, renderers, and plugins are split up into separate processes. And many of Apple's frameworks have been refactored under the hood to use separate processes to try to further fortify the OS against exploits.

Another solution is break up the work and processing in increments. For example, rather than trying to load a data file in one shot, read a fraction of the bytes, then on the next event loop, read some more. Repeat until done. This can work with both async (like in Javascript) or you can do a poll model. Additionally, if you have coroutines (like in Lua), they are great for this because each coroutine has its own encapsulated state so you don't have to manually track how far along you are in your execution state.

1 comments

> One solution is processes

More expensive to start than threads, and far more expensive and complex and restrictive to move data around. Sounds like with the exception of some specific corner cases, threads are a better solution.

> Another solution is break up the work and processing in increments

Either the tasks aee broken into ridiculously fine-grained bits that are hard to make sense or keep track,or you still get a blocking UI. Furthermore, the solution is computationally more expensive.

Fork/exec time for extra processes is usually unimportant. If data transfer is truly a bottleneck, shared memory is as fast as threading.

These costs, though, are generally trivial compared to the lifecycle costs of dealing with multithreaded code. Isolation in processes greatly enhances debuggability, and it's almost impossible to produce a truly bug-free threaded program. Even a heavily tested threaded program will often break mysteriously when compiled with a different compiler/libraries, or even when seemingly irrelevant code changes are made. It's a tar pit.

> More expensive to start than threads,

Maybe, but, on Linux, processes and threads are almost the same thing.

Additionally, even where a process is a bit more expensive to create, it is not enough to block the UI thread from being responsive. I have first hand experience with this on different operating systems, including Windows, and it is more than fast enough to keep the UI completely responsive.

> and far more expensive and complex and restrictive to move data around.

Not necessarily. For threading, synchronization patterns are not necessarily simple. (This is why computer science instruction spend time on these principles.)

Furthermore, some languages and frameworks provide really nice IPC mechanisms. Apple's new XPC frameworks are pretty nice and make it pretty easy to do.

> Either the tasks aee broken into ridiculously fine-grained bits that are hard to make sense or keep track,or you still get a blocking UI.

As I mentioned, coroutines make this dirt easy. It principle, this doesn't have to be hard.

> Furthermore, the solution is computationally more expensive.

That doesn't really follow. The underlying task is the where the computation is. You are just moving it, either to a process, a thread, or dividing it up, or something else (e.g. send it to a server to process). At the end of the day, it is the same work, just moved.

Yes, you might need some state flags for breaking up the work, but threading also requires resources such as creating and running the thread, the locks and protecting your shared data, and so forth. There is no free lunch any way you do this.

Processes might be more expensive but they do have advantages.

If you do use a lot of CPU time, spawning a process instead of a thread might not have any noticeable impact at all.

Additionally, IPC isolates the process, meaning it can be more resistant to hostile takeover (if you drop privs correctly) and additionally you avoid any and all shared state that could possible result in unforeseen bugs.

What’s the big O of starting a pool? It’s around 1 either way right?

Presumably the work processing time overwhelms the IPC time.