Hacker News new | ask | show | jobs
by fxtentacle 609 days ago
To me, it seemed that the technique presented here was just a logical continuation of methods that OpenAI used when they trained the Dota agents:

https://arxiv.org/pdf/1912.06719v1

And, arguably, Facebook's unsupervised pre-training for their multi-modal speech-to-text models is kind of the same idea as unsupervised pre-training for a multi-modal text-to-image diffuser.

https://ai.meta.com/research/publications/wav2vec-2.0-a-fram...