|
|
|
|
|
by pbkhrv
86 days ago
|
|
> How parallelism changed the agent’s research strategy
> With a single GPU, the agent is stuck doing greedy hill-climbing: try one thing, check the result, pick a direction, try the next thing. With 16 GPUs, the strategy shifts. ...skip... 12 experiments in a single 5-minute wave. This makes it much harder to get stuck in local optima and much easier to find interaction effects between parameters. The agent can theoretically come up with a protocol to run those same 12 experiments one-by-one and only then decide which branch to explore next - which I think would lead to the same outcome? But in this case, it just happened to have stumbled on this particular outcome only because it didn't get a chance to execute a greedy strategy after the first 1 or 2 results. Worse experiment design + parallelism = better experiment design + serialized execution ? |
|
At least in theory, adaptiveness should save samples and in this case, compute. (As noted, you can always turn the parallel into serial and so the serial approach, which gets information 'from the future', should be able to meet or beat any parallel approach on sample-efficiency.)
So if the batch only matches the adaptive search, that suggests that the LLM is not reasoning well in the adaptive setting and is poorly exploiting the additional information. Maybe some sort of more explicit counterfactual reasoning/planning over a tree of possible outcomes?