Y
Hacker News
new
|
ask
|
show
|
jobs
by
int_19h
132 days ago
This is a big reason why SOTA models are trained multimodal these days. Even when you're using them for text, the knowledge they gain from images and video improves their world models.