Hacker News new | ask | show | jobs
by piva00 1 hour ago
What?

LLMs were designed for text, it's in their name "large language model". Only with specialised encoders like vision transformers they were able to process images as well but you're absolutely wrong about the original design intent.

In the end you just added misinformation, just save the comment to your favourites and set a reminder to check it again in a few years like you wanted.

1 comments

The first technological breakthroughs were with face and red eye detection in 2003. Then object detection between 2008-2012. Text models didn't become useful until about 2016. Please watch the first course of Dr Fei Fei Li's lectures on the subject.