Hacker News new | ask | show | jobs
by naasking 1359 days ago
A userspace thread captures full a stack context, a delimited continuation used in async programming captures only the referenced variables needed for the remaining computation. This is a strict subset of the userspace thread state.

For instance:

   fn f() { var v1 = ..., var v2 = ...; g(v2); }
   fn g(v2) { await; /* do something with v2 */ }  // await is a context switch
A userspace thread captures v1 and v2, an async computation typically only captures v2. Compound this by all variables on the stack up to the await point, and the difference can be substantial.
2 comments

I don't understand your example. What is resumed after the await in f?

Generally stackfull continuations can capture more than stackless, but do not have to. If the context switch in f resumes into g, then only v2 needs to be captured.

If it resumes in an (indirect) caller of f then v1 will have to be captured if still live, but then again, this is not expressible at all with stackless coroutines, without explicitly or implicitly suspending all immediate callers (which would end up capturing v1 anyway).

That is, a stackful continuation equivalent to a stackless one only need to capture the same amount of state.

Also I don't think that defining as delimited the async continuations as opposed to the stackful ones is correct. You can have stackful delimited continuations.

> That is, a stackful continuation equivalent to a stackless one only need to capture the same amount of state.

Of course if they're equivalent, then they're equivalent. That's simply not the case with goroutines vs. async functions in existing system where the program is written in a sort of continuation-passing style and so the captured state is more explicit.

Of course you could also perform some sophisticated transform a goroutine program into this form as well and, with a suitable static analysis, also shrink the captured state in this fashion. However, the fact is that no existing system works like this, and so what I wrote previously is an accurate description of the tradeoffs at this time.

So, you didn't give the explanation of what your snippet is supposed to do but, assuming that await returns control to f (as that the only thing that can happen with stackless coroutines) the equivalent in go would be to spawn a goroutine when calling g (pseudocode as I know approximately zero go):

   fn f() { var v1 = ..., var v2 = ...; go g(v2); }
   fn g(v2) { /* do something with v2 */ }  
There is no await as they are implicit in go. The coroutine spawned in f will only need to capture v2. I could make similar examples in cilk++, lua, or really any language with stackful coroutines.

Of course if you do not spawn a coroutine in f and it is instead part of another coroutine, v1 might be captured (unless the compiler identifies it as dead and reuses the stack slot, say, for v2). But to express the same with stackless coroutines you need to make f also a coroutine which will end up capturing v1 if live across the call.

Am I missing something?

> So, you didn't give the explanation of what your snippet is supposed to do but, assuming that await returns control to f

No, await is a context switch of some kind. In a stackful implementation the stack is switched to another thread at that point (say for an I/O wait), in an async implementation, the point after await is resumed with the live variables needed for the remainder of the program because it will be passed an explicit continuation.

> Of course if you do not spawn a coroutine in f and it is instead part of another coroutine, v1 might be captured (unless the compiler identifies it as dead and reuses the stack slot, say, for v2). But to express the same with stackless coroutines you need to make f also a coroutine which will end up capturing v1 if live across the call.

Yes, the idea is that v1 is not live, and existing stackful implementations will capture it regardless, where an async written program written in CPS form will not capture it. As I initially said the state captured by the latter is a strict subset of the former.

So, don't you agree that the stackful equivalent of your stackless example will use something like 'go g(v1)' which will only capture v1?

Without explicitly forking v2 will be captured, but then it is a completely different program with different semantics and it doesn't make sense to say that it captures more.

Edit to be more practical: in c++ you can have both stackless and stackful coroutines. If you write the same program, say using asio, with either feature, the same data will be captured.

> So, don't you agree that the stackful equivalent of your stackless example will use something like 'go g(v1)' which will only capture v1?

It will only capture v1 and also reserve a larger stack space in case the new computation needs it, where the stackless equivalent does not require this.

> Without explicitly forking v2 will be captured, but then it is a completely different program with different semantics and it doesn't make sense to say that it captures more

It doesn't have different semantics just because v1 changes ones space behaviour and not the other's.

I'm not interested in Turing tarpit arguments that one can be made equivalent to the other. As I've already said, the point is what existing systems encourage what sort of program architectures and what allocation behaviour naturally follows. It's long been evident that stackless abstractions clearly must capture strictly less state at any given time.

What is the fundamental difference between coroutines and userspace threads that makes such optimizations possible for coroutines and impossible for userspace threads?