|
|
|
|
|
by jaberjaber23
262 days ago
|
|
testing cuda kernels on different gpus costs $7k/month in cloud rentals so built an emulator instead you give it a kernel, it predicts execution time on any gpu without running it. h100, a100, v100, whatever. how: scraped specs for 50+ nvidia gpus, built tile-based simulator that models memory bandwidth, occupancy, and sm scheduling. validated against 12 real gpus and the mean error 1.2% doesn't work for: dynamic parallelism, multi-gpu, tiny kernels under 1us but I will figure it out soon if anyone's solved this differently? |
|