|
|
|
|
|
by lukev
425 days ago
|
|
Tangential, but this brings up a really interesting question for me. LLMs are multi-lingual without really trying assuming the languages in question are sufficiently well-represented in their training corpus. I presume their ability to translate comes from the fact that there are lots of human-translated passages in their corpus; the same work in multiple languages, which lets them figure out the necessary mappings between semantic points (words.) But I wonder about the translation capability of a model trained on multiple languages but with completely disjoint documents (no documents that were translations of another, no dictionaries, etc). Could the emerging latent "concept space" of two completely different human languages be similar enough that the model could translate well, even without ever seeing examples of how a multilingual human would do a translation? I don't have a strong intuition here but it seems plausible. And if so, that's remarkable because that's basically a science-fiction babelfish or universal translator. |
|
In the case of non-human communication, I know there has been some fairly well-motivated theorizing about the semantics of individual whale vocalizations. You could imagine a first pass at something like this if the meaning of (say) a couple dozen vocalizations could be characterized with a reasonable degree of confidence.
Super interesting domain that's ripe for some fresh perspectives imo. Feels like at this stage, all people can really do is throw stuff at the wall. The interesting part will begin when someone can get something to stick!
> that's basically a science-fiction babelfish or universal translator
Ten years ago I would have laughed at this notion, but today it doesn't feel that crazy.
I'd conjecture that over the next ten years, this general line of research will yield some non-obvious insights into the structure of non-human communication systems.
Increasingly feels like the sci-fi era has begun -- what a time to be alive.