Y
Hacker News
new
|
ask
|
show
|
jobs
by
tbruckner
200 days ago
A simple cue like asking the model to 'see' or 'hear' can push a purely text-trained language model towards the representations of purely image-trained or purely-audio trained encoders.