Y
Hacker News
new
|
ask
|
show
|
jobs
by
KhoomeiK
957 days ago
We use Google OCR to convert the screenshot into whitespace-structured text for unimodal LLMs. For multimodal LLMs like GPT-4V, you don't need to use the OCR utilities. Thanks for the feedback though, we'll clarify!