|
|
|
|
|
by wfme
697 days ago
|
|
My experience reflects this too. My hunch is that GPT-4o was trained to game the benchmarks rather than output higher quality content. In theory the benchmarks should be a pretty close proxy for quality, but that doesn't match my experience at all. |
|