|
|
|
|
|
by mrandish
128 days ago
|
|
> Yeah, these benchmarks are bogus. It's not just over-fitting to leading benchmarks, there's also too many degrees of freedom in how a model is tested (harness, etc). Until there's standardized documentation enabling independent replication, it's all just benchmarketing . |
|