Daytona from AT&T Labs built the database into the OS...I don’t think it worked very well. The inverted model makes more sense. A process per query isn't very scalable.
So Daytona compiles and runs a new binary for each query, not just for each connection, and does all its coordination through files + pipes + shared memory. Query binaries even fork themselves for parallelism(!). My understanding is that the cost of switching between kernel and user space is pretty overwhelming. Maybe STM would help that someday but now you’ve implemented an actual database in the kernel...
PostgreSQL’s process per user model makes way more sense but as you say still has upper limits on concurrency.
Although it doesn't handle unbounded concurrency, there are techniques (like manual locking, or pgbouncer) to deal with this.