|
|
|
|
|
by mr_octopus
126 days ago
|
|
Thanks for trying it! :) Each gpu_* call emits SPIR-V and dispatches via Vulkan
compute. Data stays resident in VRAM between calls — no
round-trips to CPU unless you need the result. No thread_id exposed. The runtime handles thread indexing
internally — gpu_add(a, b) means "one thread per element,
each does a[i] + b[i]." Workgroup sizing and dispatch
dimensions are automatic. The tradeoff: you can't write custom kernels with shared
memory or warp-level ops. OctoFlow targets the 80% of
GPU work that's embarrassingly parallel. For the other
20% you still want CUDA/Vulkan directly. Cheers |
|