| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by silveraxe93 488 days ago

Posted 4 days ago:

> Three state of the art VLMs - Claude-3, Gemini-1.5, and GPT-4o

Literally none of those are state of the art. Academia is completely unprepared to deal with the speed Ai develops. This is extremely common in research papers.

That's literally in the abstract. If I can see a completely wrong sentence 5 seconds into reading the paper, why should I read the rest?

3 comments

michaelt 488 days ago

What models would you recommend instead, for sophisticated OCR applications?

Honestly I thought Claude-3 and GPT-4o were some of the newest major models with vision support, and that models like o1 and deepseek were more reasoning-oriented than OCR-oriented.

link

guyomes 488 days ago

My anecdotal tests and several benchmarks suggest that Qwen2-VL-72b [0] is better than the tested models (even better than Claude 3.5 Sonnet), notably for OCR applications. It has been available since October 2024.

[0]: https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct

link

silveraxe93 488 days ago

For Google, definitely flash-2.0; It's a way better model. GPT-4o is kinda dated now. o1 is the one I'd pick for OpenAI. It's basically their "main" model now.

I'm not that familiar with Claude for vision. I don't think Anthropic focusses on that. But the 3.5 family of models is way better. If 3.5 Sonnet supports vision that's what I'd use

link

diggan 488 days ago

> For Google, definitely flash-2.0;

It was literally launched February 5th, ~10 days ago. I'm no researcher, and I know "academia moves slow" is of course true too, but I don't think we can expect research papers to include things that were launched probably after they finished the reviews of said paper.

Maybe papers aren't the right approach here at all, but I don't feel like it's a fair complaint they don't include models released less than 2 weeks ago.

link

silveraxe93 488 days ago

It was officially launched 10 days ago, but has been openly available for way longer.

Also, this is arxiv. The website that's explicitly about posting research pre peer-review.

link

diggan 488 days ago

> It was officially launched 10 days ago, but has been openly available for way longer.

So for how long? How long did the papers you've written in the past take to write? AFAIK, it takes some time.

And peer-review is not the only review a paper goes through, and was not the reviews I was referring to.

link

silveraxe93 488 days ago

Honestly? I don't know how long it's been available. But I do know it's been some time already. Enough be aware of it when posting this on arxiv.

I'm not even disagreeing that it takes time to write papers, and it's "common" for this to happen. But it's just more evidence for what I said in my original comment:

> Academia is completely unprepared to deal with the speed AI develops

link

thelittleone 488 days ago

Anthropic has a beta endpoint for PDFs which has produced impressive results for me with long and complex PDFs (tables, charts etc).

link

esafak 487 days ago

https://docs.anthropic.com/en/docs/build-with-claude/pdf-sup...

link

lisnake 488 days ago

They may have been SotA at the moment of writing

link

silveraxe93 488 days ago

Sure, but they posted this 4 days ago. The minimum I'd expect for quality research is for them to skim the abstract before posting and change that line to:

"Models from leading AI labs" or similar. Leaving it like now signals either sloppiness or dishonesty

link

eptcyka 488 days ago

The speed of publishing is just too slow. If you want to apply any kind of scientific rigor and have your peers check what you're doing (not even doing a full peer review), things take more time than just posting on blogs and iterating.

link