| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jackblemming 252 days ago
	Seems cute, but ultimately not very valuable without benchmarks or some kind of evaluation. For all I know, this could make Claude worse.

2 comments

jelling 252 days ago

Same. We've all fooled ourselves into believing that an LLM / stochastic process was finally solved based on a good result. But the sample size is always to low to be meaningful.

link

anuramat 252 days ago

even if it works as described, I'm assuming it's extremely model dependent (eg book prerequisites), so you'd have to re-run this for every model you use, this is basically poor man's finetuning;

maybe explicit support from providers would make it feasible?

link