Hacker News new | ask | show | jobs
by valine 749 days ago
Very curious how it performs on OCR tasks compared to InternVL. To be competitive at reading text you need tiling support, and InternVL does tiles exceptionally well.
1 comments

I think CogVLM2 is even better than Intern at OCR (my usecase is extracting information from an invoice)
After some superficial testing I with bad quality scans you can find on kaggle I can not confirm that. CogVLM2 refuses to handle scans that InternVL-V1.5 still can comprehend.