Hacker News new | ask | show | jobs
by ipodopt 1476 days ago
It has been a sec but if I were to do another multi-threaded async Rust project I would do one thread per async runtime and explicitly pass anything that needed to be shared.

This should be more ergonomic as this should get rid of everything needing to have send/sync traits. I also suspect it may be more performant as I am not sure how good the async runtimes are about keeping scopes pinned to a particular core so its not constantly jumping around and busting the l1 caches (which would be extremely detrimental to compute latency and bandwidth)... Happy to be schooled on any of this.

2 comments

But what when you have some threads slacking off, and others too busy? It would be nice in this case to use those idle threads, even if it means a little bit of CPU cache trashing. And I believe this is what Tokio offers with a work stealing thread pool.
True, but I suspect that without a truly global prescient scheduler it is almost never worth it to core switch unless you generally have really long tasks.

For an efficient core context switch the scheduler must accurately predict that the source (current) core won't be free for the duration of the full core context switch time and that the sink core will be free by the time the meta context gets there and will have been free by the time the rest gets there. Otherwise, the scheduler ends up thrashing the cpu (it is actually a bit worse as future task might need same context so you have to be aware of the future). So, for the scheduler to know this it would need to be:

- Global: The only scheduler on the system or basically rafting with all the other schedulers on the system

- Prescient: The scheduler(s) would need to be able to predict all tasks, thier context, and work time per task perfectly. Which could really could only happen when everything is static and hence deterministic.

For example, I think most tasks people are throwing at async are web requests. Most actually take the core an order of magnitude shorter time to compute then the time it takes passing the context from one core to another and they are all unpredictable to the scheduler. In this scenario I could see the scheduler taking up the majority of computational time on the system. So turn on multi-threading + async on a quad core and you will get worse bandwidth and latency(always) for all your pains.

EDIT: Although this single data point would tell me I am wrong (see description):

https://www.youtube.com/watch?v=IG-wGXENTt8

Look at Glommio, it’s essentially what you describe.
Thanks for the link, very helpful :)