|
|
|
|
|
by mdasen
46 days ago
|
|
It's really interesting how much the AI harness seems to matter. Going from 48% via Google's official results to 65% is a huge jump. I feel like I'm constantly seeing results that compare models and rarely seeing results that compare harnesses. Is there a leaderboard out there comparing harness results using the same models? |
|