Hacker News new | ask | show | jobs
by mountainriver 382 days ago
check the benchmarks or make one of your own
1 comments

I checked the BlEU-Score and Perplexity of popular models and both have stagnated around 2021. As a disclaimer this was a cursory check and I didn't dive into the details of how individuals scores were evaluated.
on what benchmarks? pretty much every major one is linear improvement