|
|
|
|
|
by rajveerb
30 days ago
|
|
I read through this blog post and it's timely given how close the models are to max out the benchmarks/evals. One thing which was not addressed but will be interesting to discuss would be benchmarks/evals that conflict. Are there desirable emergent behavior that might not be optimized because the evals penalize them? |
|