Y
Hacker News
new
|
ask
|
show
|
jobs
by
campbel
842 days ago
Opus got it correct for me. Seems like there is correct and incorrect responses from the models on this. I think testing 1 question 1 time really isn't worth much for an accurate representation of capability.