Hacker News new | ask | show | jobs
by ryanackley 652 days ago
The technology that allows an LLM to "see" images and video is completely different though. It's not what is being trained on common crawl.
1 comments

not really. embeddings are embeddings. check out llava