|
|
|
|
|
by hedgehog
40 days ago
|
|
My suspicion is a lot of the difference in performance in newer models comes from more and better code reasoning and debugging tasks in the RL phase, along with actual security bug finding workflows. When sessions get long and instruction-following gets less reliable you start relying more on the model's baked in behavior + steering from the harness, both still in a way a product of human ingenuity. At least so far. For bug finding I think there will be value to cost/performance tuning for a long time, and hybrid techniques (smarter goal-oriented fuzzing etc). |
|