|
|
|
|
|
by MichealCodes
265 days ago
|
|
I really hope benchmarking improves soon to monitor the model in the weeks following the announcement. It really seems like these companies introduce a new "buffed" model and then slowly nerf the intelligence through optimizations. If we saw task performance week 1 vs week 8 on benchmarks, this would at least give us more insight into the loop here. In an environment lacking true progress a company could surely "show" it with this strategy. |
|