Seeding a userspace CSPRNG from /dev/urandom or getrandom(2) is actually the right approach. The kernel pool is a resource shared among all processes, with the locking and therefore scalability constraints that implies.
Also, it's a good idea to reseed your userspace CSPRNG from the kernel's entropy source periodically. This is precisely what OpenBSD did with their arc4random and arc4random_buf library calls. (Don't worry, they now use ChaCha20 instead of arcfour/RC-4, but have kept the function name unchanged, since it's a backward-compatible change... the only way a caller can tell RC-4 from ChaCha20 is by detecting small statistical biases in RC-4's outputs.)