|
|
|
|
|
by giancarlostoro
28 days ago
|
|
I was researching if there was much benefit to using Rust or C++ over Python for AI, and turns out, the GPU doesn't care once the instructions are in because its an entirely different spec running on the GPU. The only thing you might save on is "startup" costs of getting your code into the GPU I guess? I assume that time cost is miniscule though, once its all in memory, nobody cares that you spent any time "booting it up" any more than how long Windows takes these days. |
|
Tool calling, searches, cache movement if used, and even debug steps all stall the GPU waiting for the CPU.
There was a test of turning one of the under 1B Qwen3+ models into a kernel that didn't stall by the CPU as one GPU pass that saw quite a bit f perf lift over vLLM, I believe, showing this is an issue still.
Its been a month, so I don't remember more details than this.