| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by efavdb 13 days ago
	Article says this misses important details, eg data that might be in the image.

1 comments

very bad take. with most modern multomodal models you get way better performance then going to text first

it's a cost/latency trade-off in production + very use-case dependent