| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stingraycharles 27 days ago
	Am I the only one who wasn’t particularly impressed by AutoResearch? If you looked at what the agent was actually doing, it was just tuning parameters mostly, not really trying different novel approaches. I couldn’t help myself but consider this mostly a very inefficient variant of hyperparameter optimization, but someone correct me if I’m wrong, I may be looking at this too pessimistic.

7 comments

lacker 27 days ago

I didn't dig into what the actual repository was doing, but personally, I took some inspiration from the idea after reading about it and realizing that I might have been underestimating the ability of LLMs. I put a bit more work into a performance harness I was using locally and just set some agents to brainstorming and they did seem to find some great stuff. So I don't really have a stance one way or another on this specific repo, but the general idea seems like a really good one.

link

delis-thumbs-7e 26 days ago

Could you elaborate in specifics how you had been underestimating models? Ypu mean just using more tighter harnessing to make them work in structured agentic eay or something else?

link

lacker 26 days ago

The specific code I was working on, I had a general idea of the sort of performance improvement that would be possible. I just thought that it would be too hard for the models to figure out without a lot of hand-holding.

But it ended up being not "too hard ever", but more like, in 1 out of every 5 tries, the model did in fact manage to get a large refactoring to the point where it improved performance. So once I set it up to try something, use the perf test, see if it worked, if not, throw it away, repeat. Then it started, slowly, finding some useful things.

link

inciampati 26 days ago

Just remember that the will do clever but useless things to improve. Like changing the random seed as per autoresearch's hero image. lol! imo, out of the box thinking is needed.

link

druub 26 days ago

Ever since AlphaEvolve - the idea that if you build a harness which can evaluate solutions and give LLMs a database where they can keep storing their work and then sample from it - they do find non-trivial solutions over time leaning from their own past ideas.

It is the ultimate manifestation of test-time scaling. I think karpathy just popularised it.

link

clbrmbr 27 days ago

Karpathy embedded within an organization is way more impressive than him out on his own with hot takes and little projects. I hope he does great things for Anthropic.

link

stingraycharles 26 days ago

Absolutely, I wasn’t saying that him being at Anthropic wasn’t going to be effective, I just think his little projects wouldn’t be very interesting if his name wasn’t attached to them.

link

latentsea 26 days ago

I was impressed that I was able to take the same basic idea and apply it to anything that a Claude could construct a metric for. It's nice being able to just run /autoresearch and speed up your test suites, and shave time off your builds etc.

It's a decent tool to have in the toolbox.

link

vdelpuerto 26 days ago

I was trying to look options outside the box (everything is more context or RAG) and been using this approach for about a month with good results. https://github.com/VDP89/fscars

link

teravor 27 days ago

    > Am I the only one who wasn’t particularly impressed by AutoResearch?

isn't it just a nerfed AlphaEvolve? https://arxiv.org/abs/2506.13131

link

DesaiAshu 27 days ago

Inefficient variants with $100m+ worth of compute will still probably outperform the best team of researchers

link

godelski 26 days ago

That's not the question. The question is how much you need to give the best team of researchers to beat $100m+ worth of compute. $1m of compute? $10m? Clearly giving the best team $100m is going to beat out giving an efficient group $100m. It does in fact matter who you throw your money at...

link