Hacker News new | ask | show | jobs
by mr_octopus 126 days ago
Thanks for trying it! :)

Each gpu_* call emits SPIR-V and dispatches via Vulkan compute. Data stays resident in VRAM between calls — no round-trips to CPU unless you need the result.

No thread_id exposed. The runtime handles thread indexing internally — gpu_add(a, b) means "one thread per element, each does a[i] + b[i]." Workgroup sizing and dispatch dimensions are automatic.

The tradeoff: you can't write custom kernels with shared memory or warp-level ops. OctoFlow targets the 80% of GPU work that's embarrassingly parallel. For the other 20% you still want CUDA/Vulkan directly.

Cheers