Show HN: GPT-4 vision utilities to browse the web

Y	Hacker News new \| ask \| show \| jobs

	Show HN: GPT-4 vision utilities to browse the web (github.com)
	10 points by asim-shrestha 956 days ago

1 comments

ilaksh 956 days ago

The README seems mismatched.

link

asim-shrestha 956 days ago

Mind elaborating here? Happy to update the README if there are issues

link

ilaksh 956 days ago

It says Google OCR in the usage example, but your description above mentions GPT-4 vision. So this makes no sense. I see nothing about Open AI API key in the example.

link

KhoomeiK 956 days ago

We use Google OCR to convert the screenshot into whitespace-structured text for unimodal LLMs. For multimodal LLMs like GPT-4V, you don't need to use the OCR utilities. Thanks for the feedback though, we'll clarify!

link