Hacker News new | ask | show | jobs
by saman_b 3517 days ago
You are right, I have access to machines with higher number of cores, but they have multiple sockets and at some point I need to address the cross NUMA cost which adds a whole new level of complexity and design decisions.

For sure at some point the poller thread will be saturated and the program will not scale past a certain number of threads. I used to have a poller thread per cluster for better scalability, but that would add overhead for migrations between clusters, thus I had to remove it for now until I can somehow find a low overhead solution. uThreads is a work in progress and all these need to be carefully considered in the future :) Thanks for your feedback

1 comments

Sometimes, the techniques you use to scale to 100s of threads solve some NUMA issues by virtue of the fact that in order to scale that high, you need to avoid touching as much non-local data as possible. I think it's better to just deal with the pain now and start running your experiments on as large of a machine you can get access to. You can still put off explicitly designing for NUMA, but you want to avoid spending too much time and effort designing for the lower end of the scalability spectrum.