Y
Hacker News
new
|
ask
|
show
|
jobs
by
epolanski
4 hours ago
I think they simply optimize around E2E benchmarks, none of those benchmarks is designed as multi turn assistance to the user, but going from a prompt straight to the final solution.