Hacker News new | ask | show | jobs
by dvt 87 days ago
Ok, so looking at the commit log[1], I was mostly interested in seeing what the "moonshot ideas" implementations looked like, but basically everything is just hyperparameter tuning. Which is nice, but likely not worth the $$$ spent on the tokens. Am I missing something here?

[1] https://github.com/ykumards/eCLIP/commits/main/autoresearch

2 comments

It would seem wise to modify the autoresearch instructions to first estimate the computational costs rigorously and then sort and compare the proposals for human review, and for each actually executed attempt to feed back the computational costs with LoRa adapter?

i.e. perhaps minimal changes to autoresearch can take control for cost-effective research to occur.

Yes but at that point you may as well use a proper hyperparameter tuning framework like optuna if all the LLM agent is supposed to do is do hyperparameter tuning.
Does optuna think abstractly (i.e. use LLM to interpret the code and come up with insights), or just perform hyperparameter tuning experiments on user-indicated parameters?
The latter, but it uses fairly optimized approaches to ensure it selects the best candidates.

If you look at the commits, you can see that all it does is just set different values for different parameters of continuous values: the type of thing that I trust statistics a lot more than reasoning. Optuna can make very informed decisions when making lots of different changes at once, slowly converging towards optimal parameters, where the LLM seems to be throwing stuff at a wall and see what sticks.

What would work best if the LLM would try to approach things on a higher level, ie use Optuna, but reason about better approaches for algorithms and/or data or whatever. But what it ends up doing is tuning parameters manually, only one / a few at a time, extremely inefficient and unlikely to be optimal.

but you said

> Yes but at that point you may as well use a proper hyperparameter tuning framework like optuna if all the LLM agent is supposed to do is do hyperparameter tuning.

while the "novelty" of autoresearch is that it may symbolically reason about the computation, analyze the codebase, etc. i.e. a wider search space (harder) but symbolic reasoning.

Optuna or skopt are open source and won't take any GPU time at all to do it.
Optuna requires exploring the hyperparameter space which means running the experiments with those hyperparameters.

For a fixed search space it will almost certainly be better though.