|
|
|
|
|
by schipperai
2 days ago
|
|
Cognition did well in documenting their approach [1]. TL;DR - they worked with OSS project maintainers to build tasks. They score models based on whether a PR is mergeable. All tasks are graded by a human researcher. SoTA models have hill-climbing to do which raises the bar and inspires confidence. I'd say it's legit. [1]: https://x.com/cognition/status/2064061031912288715 |
|