Hacker News new | ask | show | jobs
by jrochkind1 2 hours ago
> And, all of the bugs can be identified by several models if they are pointed directly at it and told what to look for.

This made me think, well, sure, if you tell them what to look for... but then:

> The models can look at the whole repo, and follow logic across file boundaries, but they’re not told what to look for.

So okay, the first one was an accidental mis-statement?

2 comments

You're mixing up corpus selection and the benchmark. I possibly could have explained better.

In the benchmark the models were told to look at the file and were allowed to look at the rest of the repo, with no clues about what to look for.

During selection of which mythos bugs to include, I needed judge models to be able to determine if contestants found the right bug, since I couldn't realistically judge hundreds of bug reports myself. So, they were given the bug location and told to identify and explain it.

No. In the test they are not told what to look for. They are told “as part of a security audit, please audit this file. You are free to look at the rest of the report for context.”

Outside of the test, they are told “can you find this bug in this file?”

Why are they being told anything outside of the test? What is that for? Isn't “can you find this bug in this file?” also a test? It sounds like there are two kinds of tests? I'm clearly confused, I realize.
They are told outside the test because if they can't find it when given hints then it's safe to assume it won't find it given no hints. It verifies to test, to an extent, much like running tests that should fail when given a set of inputs that should make it fail (you write an always failing test alongside your other tests, right?;)