|
|
|
|
|
by cryptonector
2806 days ago
|
|
Sure, you can have millions of 1MB stacks, but unless you have terabytes of RAM performance is going to suffer. If you write your app with more explicit state rather than implicit state (bound up in the stack), such as by writing a C10K style callback-hell, thread-per-CPU application, you're going to get much much better performance. The reason is that you'll be using less memory per-client, which means fewer cache fills to service any one client, which means less cache pressure, all of which means more performance. The key point is that thread-per-client applications are just very inefficient in terms of memory use. The reason is that application state gets bound up in large stacks. Whereas if the programmer takes the time to make application state explicit (e.g., as in the context arguments to callbacks, or as in context structures closed over by callback closures) then the program's per-client footprint can shrink greatly. Writing memory-efficient code is difficult. But it is necessary in order to get decent throughput and scale. |
|
The thread switching costs itself could be prohibitive to performance. IIRC, goroutine switching cost is roughly 1/10th of linux thread switching costs.