BPF is going to change so many things... At the moment I'm having lots of trouble with the tooling but hey, let's just write BPF bytecode by hand or with a macro-asm. Reduce the ambitions...
Also wondering whether we should rethink language runtimes for this. Like write everything in SPARK (so all specs are checked), target bpf bytecode through gnatllvm. OK you've written the equivalent of a cuda kernel or tbb::flow block. Now for the chaining y'all have this toolbox of task-chainers (barriers, priority queues, routers...) and you'll never even enter userland? I'm thinking /many/ programs could be described as such.