Because I think there is no way to avoid concepts like thread_id.
I'm curious how GPU programming can be made (a lot) simpler than CUDA.
So instead of writing a kernel with thread_id:
let c = gpu_add(a, b) let total = gpu_sum(c)
So instead of writing a kernel with thread_id:
The thread indexing is still there — just handled by the runtime, like how Python hides pointer math.