Hacker News new | ask | show | jobs
by Thaxll 467 days ago
Do you benchmark the right thing though? It seems to focus a lot on image / charts etc...

The 95% from their benchmark: "we evaluate them on our internal “text-only” test-set containing various publication papers, and PDFs from the web; below:"

Text only.

1 comments

Our goal is to benchmark on real world data. Which is often more complex than plain text. If we have to make the benchmark data easier for the model to perform better, it's not an honest assessment of the reality.