Hacker News new | ask | show | jobs
by imranq 959 days ago
Is the vision model directly reading the screen and therefore also reading the Vimeo tags? It might be more effective to export the DOM tags and the associated elements as a Json object that is fed into chatGPT without using the vision component
1 comments

> Currently the Vision API doesn't support JSON mode or function calling, so we have to rely on more primitive prompting methods.
I found that it works well to ask it to generate JSON the best it can, then pass it to gpt-3.5-turbo with the JSON response mode and instruct it to just clean up whatever input it received.
Perfect, I have this as a todo in my readme and I’ll implement this soon