| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by imranq 959 days ago
	Is the vision model directly reading the screen and therefore also reading the Vimeo tags? It might be more effective to export the DOM tags and the associated elements as a Json object that is fed into chatGPT without using the vision component

1 comments

dymk 959 days ago

> Currently the Vision API doesn't support JSON mode or function calling, so we have to rely on more primitive prompting methods.

link

maccam912 959 days ago

I found that it works well to ask it to generate JSON the best it can, then pass it to gpt-3.5-turbo with the JSON response mode and instruct it to just clean up whatever input it received.

link

ishan0102 959 days ago

Perfect, I have this as a todo in my readme and I’ll implement this soon

link