Hacker News new | ask | show | jobs
Show HN: GPT-4 vision utilities to browse the web (github.com)
10 points by asim-shrestha 956 days ago
1 comments

The README seems mismatched.
Mind elaborating here? Happy to update the README if there are issues
It says Google OCR in the usage example, but your description above mentions GPT-4 vision. So this makes no sense. I see nothing about Open AI API key in the example.
We use Google OCR to convert the screenshot into whitespace-structured text for unimodal LLMs. For multimodal LLMs like GPT-4V, you don't need to use the OCR utilities. Thanks for the feedback though, we'll clarify!