|
|
|
|
|
by igorkraw
337 days ago
|
|
Could you either release the dataset (raw but anonymized) for independent statistical évaluation or at least add the absolute times of each dev per task to the paper? I'm curious what the absolute times of each dev with/without AI was and whether the one guy with lots of Cursor experience was actually faster than the rest of just a slow typer getting a big boost out of llms Also, cool work, very happy to see actually good evaluations instead of just vibes or observational stuies that don't account for the Hawthorne effect |
|
We'll be releasing anonymized data and some basic analysis code to replicate core results within the next few weeks (probably next, depending).
Our GitHub is here (http://github.com/METR/) -- or you can follow us (https://x.com/metr_evals) and we'll probably tweet about it.