Y
Hacker News
new
|
ask
|
show
|
jobs
by
tempusalaria
587 days ago
This is very similar to how LLMs are taught to understand images in llava style models (the image embeddings are encoded into the existing language token stream)