Hacker News new | ask | show | jobs
by freehorse 259 days ago
To do that properly, one needs some kind of control, which is hard to do with one person. It should be doable with proper effort, but far from trivial, because it is not enough to measure what you actually did in one condition, you have to compare it with sth. And then there can be a lot of noise for n=1: when you use LLMs, maybe you happen to have to solve harder tasks. So you need at least to do it over quite a lot of time, or make sure the difficulty of tasks is similar. If you have a group of people, you can put them into groups instead and thus not care as much for these parameters, because you can assume that when you average this "noise" will cancel out.