|
|
|
|
|
by breckenedge
425 days ago
|
|
> These […] get better every day. They do, but I’ve seen a huge slowdown in “getting better” in the last year. I wonder if it’s my perception, or reality. Each model does better on benchmarks but I’m still experiencing at least a 50% failure rate on _basic_ task completion, and that number hasn’t moved higher in many months. |
|