Y
Hacker News
new
|
ask
|
show
|
jobs
by
efavdb
13 days ago
Article says this misses important details, eg data that might be in the image.
1 comments
breadislove
13 days ago
very bad take. with most modern multomodal models you get way better performance then going to text first
link
emil_sorensen
13 days ago
it's a cost/latency trade-off in production + very use-case dependent
link