|
|
|
|
|
by godelski
492 days ago
|
|
I agree with you on the first part, but no, code is not easy to verify. I think you missed part of what I wrote. I mean verify that your code is bug free. This cannot be done purely through testing. Formal verification still remains an unsolved problem. |
|
Another issue is, how much data can you synthesize in such a way, so that you can construct both the problem and solution, so that you know the answer before using it as a sample.
Ie, some problems are easier to make knowing you can construct the problem yourself, but if you were to solve said problems, with no prior knowledge, they would be hard to solve, and could be used as a scoring signal?
Ie, you are the Oracle and whatever model is being trained doesn't know the answer, only if it is right or wrong. But I don't know if the reward function must be binary or on a scale.
Does that make sense or is it wrong?