Hacker News new | ask | show | jobs
by VikingCoder 4444 days ago
I've been using threads to do co-operative multi-tasking, for a while now.

Every place that I'm tempted to write an event-driven finite state machine, or something similar, I spawn a thread instead. I get to write synchronous code, which feels much more natural to me.

For instance my actor, running in a thread, calls a function like advance(). That drops data into an object, and wakes up the main thread, and blocks.

The next time the main thread wants to give processing time to the actor, it describes the world into the same shared object, waked up the actor's thread, and blocks.

Doing a switch like this dozens or even hundreds of times per second seems to work pretty well, especially if the main thread only gives execution to the actor thread when it needs to - inputs have changed, etc.

For my use cases, it radically simplifies my code, and I have a small number of different inputs to handle, so it has been scaling well.

2 comments

That is precisely the beauty of coroutines: your code can look like threaded code but you can get the performance of an event-loop. Put another way, with coroutines you get most of the advantages of an event loop but very few of its disadvantages. (You get concurrency only, not parallelism, well this is almost true). For cooperative multi-tasking, threads are unnecessarily resource hungry and wasteful. They hoard resources although most of the time they are doing nothing just waiting. This is usually fine if you think yours is the only application that should be running on the hardware at that time, but typically you want to share the hardware with others.

Remember not everybody has the luxury of working on a system where you can spawn 10,000 threads without breaking a sweat.

However this territory of literally hundreds of thousands of programmed agents participating in a game does not seem to be very populated. Perhaps part of the reason is that very few languages had efficient (this eliminates stack copying), scalable and portable support of coroutines. This is starting to change, but not as fast as I would like.

I think C is to be blamed for the long under appreciated status of coroutines. It is one abstraction that C left out, although the VM C had as its execution model (the PDP) had excellent support for coroutines at the instruction level. C exported pretty much every abstraction of the underlying instruction set, but not coroutines.

EDIT: @VikingCoder Replying here as HN wouldnt allow me to respond till some time has past. Yes I have looked at asio although just scratched the surface. It looks very interesting, as far as I know they are not threads though (which is a good thing), they use macro and template metaprogramming trickery to turn producer-consumers into one big switch case. If you interested in coroutines and seamless interaction with C++ I can recommend http://felix-lang.org

> C exported pretty much every abstraction of the underlying instruction set, but not coroutines.

My hunch is that the designers of C would have said "goto" and "switch" cover the use case where you have a bunch of peer chunks of code that you want to freely bounce between.

Remember, at the time function calls were considered expensive, so not support full coroutines across function call boundaries may not have been on their minds as much.

Indeed, and thanks for the excellent commentary on coroutines, awaiting the blog post. I think another fact played into their decision: coroutines do not specify how they are to be scheduled, that leaves room for arbitrary policies. Not that C shied away from keeping things undefined.
Have you experimented with Boost Context and Boost Coroutine?
@srean - can you provide a link to the PDP support for coroutines? I would like to read more.
By the time I got within touching distance of any computer the era of the PDPs were long over. I have learned from John Skaller that they could exchange control between two stack frames in a single assembly instruction. "Exchange Jump" is what I think it was called. You will have to search the assembly manual for PDP-11 for more. Wikipedia has some details http://en.wikipedia.org/wiki/Coroutine#Implementations_in_as... But I am sure there are HN readers who can speak with way more authority and exhaustiveness than the wikipedia page and can probably point find you a PDP-11 manual. I think you will find this thread interesting http://permalink.gmane.org/gmane.org.user-groups.linux.tolug...

Quoting the most interesting bits from that thread, (although I urge you to read the original):

  Of the many styles of subroutine calls on the PDP-10, JSP ac,addr is the fastest,
  as it's the only one that doesn't require a memory store.

  Its ISP is something like:

        ac = PC
        PC = effective address [addr in the usual case]

  The subroutine return, of course, is:

        JRST (ac)

  Here, the efective address is the contents of the register.

  The coroutine instruction combined the two:

        JSP ac,(ac)

  This essentially exchanged the PC with ac.
Hi srean - thanks for that recap. I just did some more digging on this and tried to understand the assembly versions of coroutines. They were very spartan. It was just: POP the next address from the stack into TEMP, PUSH the current PC, then set the PC to TEMP. Notice that there isn't any linking or parameter passing.

Overall, it has been fun reading on all the variants of this idea.

side note: in my first job, there were a few PDP-11s in the lab that I was responsible for. We never turned them on though.

Also, the PDP 10, which you mention above, was one of the most revered machines by hackers.

Don't you still need to carefully use locks everywhere? For me the one reason to use coroutines instead of regular threads is that coroutines are cooperative multitasking, rather than preemptive. Which is what I want for when I have a bunch of concurrent but not parallel processes working on shared data.
I have a main thread, an actor thread, and a single object that they use to communicate. So yes, I have one lock, around the single object they use to communicate.

I'd need one lock per actor thread and its communication object.

I say again, this works in my problem domain, and probably wouldn't work in other domains.