| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by asim-shrestha 956 days ago
	Mind elaborating here? Happy to update the README if there are issues

1 comments

ilaksh 956 days ago

It says Google OCR in the usage example, but your description above mentions GPT-4 vision. So this makes no sense. I see nothing about Open AI API key in the example.

link

KhoomeiK 956 days ago

We use Google OCR to convert the screenshot into whitespace-structured text for unimodal LLMs. For multimodal LLMs like GPT-4V, you don't need to use the OCR utilities. Thanks for the feedback though, we'll clarify!

link