Hacker News new | ask | show | jobs
by viraptor 322 days ago
https://arxiv.org/abs/2409.04109

> we obtain the first statistically significant conclusion on current LLM capabilities for research ideation: we find LLM-generated ideas are judged as more novel (p < 0.05) than human expert ideas while being judged slightly weaker on feasibility.

It's a bit better than just finding related pairs. And that's with sonnet 3.5 which is basically ancient at this point.

1 comments

This paper centers "novelty" but also finds that human ideas are more feasible, and that LLM-generated ideas are not diverse and that LLMs cannot reliably evaluate ideas. None of the ideas were actually evaluated by performing experiments either.

Pretty much what I would expect. The paper also seems to be doing exactly what I described, I don't understand how the technique is better than that?