There is also another thread-per-core implementation by ByteDance (TikTok) for Rust called Monoio with benchmarks[0] comparing it to Tokio and Glommio.
The thread-per-core manifesto has a goal of not sharing data between cores, and thus the communication inside the process becomes message passing, handing off ownership of a chunk of data to the recipient core. This lack of sharing is what enables the performance (no locks etc needed, outside of the message passing).
[0] https://github.com/bytedance/monoio/blob/master/docs/en/benc...