One typical approach is double-buffering the allocation but it doesn't work here because you need to pull out the waker list to call `wake()` outside of the mutex. You could try to put the allocation back, but you have to acquire the lock again.
I had an implementation that kept a lock-free waker pool around https://docs.rs/wakerpool/latest/wakerpool/ but now you're paying for atomics too, and it felt like this was all a workaround for a deficiency in the language.
Intrusive lists are the "correct" data structure, so I kept pushing.