| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by SkiFire13 686 days ago
	But this will lead you to the same problem the tweet is talking! You are training a reward model based on human feedback (whether the code satisfies the specification or not). This time the human feedback may seem more objective, but in the end it's still non-exhaustive human feedback which will lead to the reward model being vulnerable to some adversarial inputs which the other model will likely pick up pretty quickly.

1 comments

rafaelmn 686 days ago

It's based on automated tools and evaluation (test runner, coverage, lint) ?

link

SkiFire13 686 days ago

The input data is still human produced. Who decides what is code that follows the specification and what is code that doesn't? And who produces that code? Are you sure that the code that another model produces will look like that? If not then nothing will prevent you from running into adversarial inputs.

And sure, coverage and lints are objective metrics, but they don't directly imply the correctness of a test. Some tests can reach a high coverage and pass all the lint checks but still be incorrect or test the wrong thing!

Whether the test passes or not is what's mostly correlated to whether it's correct or not. But similarly for an image recognizer the prompt of whether an image is a flower or not is also objective and correlated, and yet researchers continue to find adversarial inputs for image recognizer due to the bias in their training data. What makes you think this won't happen here too?

link

rafaelmn 686 days ago

> The input data is still human produced

So are rules for the game of go or chess ? Specifying code that satisfies (or doesn't satisfy) is a problem statement - evaluation is automatic.

> but they don't directly imply the correctness of a test.

I'd be willing to bet that if you start with an existing coding model and continue training it with coverage/lint metrics and evaluation as feedback you'd get better at generating tests. Would be slow and figuring out how to build a problem dataset from existing codebases would be the hard part.

link

SkiFire13 686 days ago

> So are rules for the game of go or chess ?

The rules are well defined and you can easily write a program that will tell whether a move is valid or not, or whether a game has been won or not. This allows you generate virtually infinite amount of data to train the model on without human intervention.

> Specifying code that satisfies (or doesn't satisfy) is a problem statement

This would be true if you fix one specific program (just like in Go or Chess you fix the specific rules of the game and then train a model on those) and want to know whether that specific program satisfies some given specification (which will be the input of your model). But if instead you want the model to work with any program then that will have to become part of the input too and you'll have to train it an a number of programs which will have to be provided somehow.

> and figuring out how to build a problem dataset from existing codebases would be the hard part

This is the "Human Feedback" part that the tweet author talks about and the one that will always be flawed.

link