Hacker News new | ask | show | jobs
by patricklef 659 days ago
MLLMs are surprisingly bad at this out of the box and to some extent even with fine tuning. https://jina.ai/news/the-what-and-why-of-text-image-modality...