|
|
|
|
|
by cedilla
1249 days ago
|
|
Phoenician script is the common ancestor of Latin, greek and Cyrillic script. You're probably thinking of the indo-european language family, which accounts for about 45% of native language speakers. The largest language family in the world, but not even a majority. Scripts and languages change over the course of decades, and while there are well known mechanisms to those changes, trying to deduce hieroglyphics or ancient Egyptian from a modern corpus is impossible. The idea that there is some shared structure in all language is known as universal grammar. If that structure exists is still hotly debated. |
|
I know this may be not be the answer that you are looking for, but the way most of these ML systems are designed is based on the idea that life, the universe, and everything can be modeled by a series of joint probabilities. For toy problems you draw a diagram
https://en.m.wikipedia.org/wiki/Graphical_model
It's an old idea in AI (predates even ML) but people have never been able to do anything useful with it outside of exam problems until the emergence of language models on modern deep learning hardware. All of a sudden variational learning and causal inference are not merely statistical word problems for grad students any more. This is the key to how most of the custom deep learning based avatar generators work. They use a Variational Autoencoder. For LLMs, it is in the form of a transformer which contains a sampling step (sampling from a distribution is the key to Bayesian methods).
I would like to emphasize the theory of probabilistic learning is very different from the actual practice. The theory we have today isn't much different from 20 years ago. Implement the methods in for example Murphy's Probabilistic ML book and they would be useless if you don't have access to modern deep learning hardware and gradient descent optimizers. Without deep learning, we won't have LLMs, regardless of how fancy the variational learning theories are.