|
|
|
|
|
by andy99
151 days ago
|
|
You obviously know what you’re looking for better than me, but personally I’d want to see a narrative that made sense before accepting that a smaller model somehow just performs better, even if the benchmarks say so. There may be such an explanation, it feels very dicey without one. |
|
We have a hard OCR problem.
It's very easy to make high-confidence benchmarks for OCR problems (just type out the ground truth by hand), so it's easy to trust the benchmark. Think accuracy and token F1. I'm talking about highly complex OCR that requires a heavyweight model.
Scout (Meta), a very small/weak model, is outperforming Gemini Flash. This is highly unexpected and a huge cost savings.
Some problems aren't so easily benchmarked.