Hacker News new | ask | show | jobs
by rfw300 90 days ago
Yeah, assuming there's no active monitoring during the training runs, you can trivially give the agent an abstraction which turns "1 GPU" into "16 GPUs" that just so happens to take 16x the wall-clock time to run.
1 comments

In fact, looking at the blog post, the agent orchestrating 16 GPUs is half as efficient as the agent using 1 GPU in GPU-time. Since it uses 16 GPUs to reach the same result as 1 GPU in 1/8 of the time.