| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by IanCal 104 days ago

I’ve used multimodal LLMs for this sort of task and if a fine tuned model would get reasonable performance compared to frontier models I’d use that. Running things purely locally lets you massively simplify the overall architecture and data transfer requirements of some of these tasks if nothing else and lower latency means you can report problems much faster (vs transfer images off device, batch process).

> The last thing I'd want to deal with is to have a computer say something like "You're absolutely right, it was wrong of me to classify the metal debris as food".

The cnn will do that potentially more often and it can be because it’s just not seen enough examples of the debris at that angle or something else equally irrelevant to a human.