|
|
|
|
|
by jrochkind1
2 hours ago
|
|
> And, all of the bugs can be identified by several models if they are pointed directly at it and told what to look for. This made me think, well, sure, if you tell them what to look for... but then: > The models can look at the whole repo, and follow logic across file boundaries, but they’re not told what to look for. So okay, the first one was an accidental mis-statement? |
|
In the benchmark the models were told to look at the file and were allowed to look at the rest of the repo, with no clues about what to look for.
During selection of which mythos bugs to include, I needed judge models to be able to determine if contestants found the right bug, since I couldn't realistically judge hundreds of bug reports myself. So, they were given the bug location and told to identify and explain it.