Hacker News new | ask | show | jobs
by tbruckner 200 days ago
A simple cue like asking the model to 'see' or 'hear' can push a purely text-trained language model towards the representations of purely image-trained or purely-audio trained encoders.