Hacker News new | ask | show | jobs
by wongarsu 1476 days ago
Was DALL-E 2 trained on captions from multiple languages? If so, this makes a lot of sense. Somewhere early in the model the words "bird", "vogel", "oiseau" and "pájaro" have to be mapped to the same concept. And "Apoploe vesrreaitais" happens to map to the same concept. Or maybe "Apoploe vesrreaitais" is rather the tokenization of that concept, since it also appears in the output. So in a sense DALL-E is using an internal language to make sense of our world.
2 comments

This looks like the artificial language Lojban was constructed: its words share parts from completely unrelated languages to the point when none of the original words are recognizable in the result.
The original words aren't recognizable at first glance, but they do serve as potential mnemonics for remembering the terms/definitions for any learners who speak one of those source languages (English, Spanish, Mandarin, Arabic, Russian, Hindi)
But that's expected behavior for a language model (especially VAEs), where's the novelty? In a VAE, the vectors are probabilistic in the latent space so this is basically the NLP version of the classic VAE facial image generation where you can tweak the parameters to emphasize or de-emphasize a feature.
Novel in engineering together of multiple concepts, if nothing else!