| I think this might be the “it” moment for AI/LLMs. I was hiking with a friend recently and we talked about this at length. The arc-AGI results from O3 are apparently a result of chain of thought given enough time to explore a solution space. Reasoning might be simply a higher dimensional form of rubix cube solving. BFS, search, back-tracking, etc. It seems unlikely that humans think in “tokens” so why do LLMs? By staying in latent space, the models are free to describe an “idea” in higher resolution than what language allows. English is coarse, granular. Latent space is a much finer representation of ideas and their interplay. Latent space is also much cheaper to execute in. The model can think without the language encoding/decoding step. This lets it branch out hundreds of ideas and explore only the most useful ones in a fraction of time that reasoning “out-loud” would take. The states also don’t need to be tied to language. Feed in a robot’s state, time series data, or any abstract data. Reason in category theory or linear algebra or complex analysis. Humans are hard wired for one set of math - an abstract latent space can represent anything. I’m a bit disappointed OpenAI didn’t stumble on this first. I’ve been skeptical of LLMs since their big debut last year. LLMs seem like a great way of solving language, but reasoning is much more complex. Once you grok the math behind the current models, you immediately question why the encoding/decoding step is there. Diffusion models are incredible but it felt that LLMs lacked the same creativity. Encoding/decoding forces a token-based discretization and therefore a loss of complexity. With the byte-latent paper it was quite clear we’d see this paper. This truly might be the “it” moment. |
1) if AI large model become more powerful avoiding language, embeddings of AI state become even more tied to the model they originate than now
Consequence: AI progress stalls, as AI user companies need to invest increasing amount of money to reindex their growing corpuses.
This is already a problem, it becomes more of a lock-in mechanism.
If this is overcome...
2) Embeddings become a viral mechanism: it makes sense for a large company that commands a market to impose to its suppliers to use the same AI models, because they can transfer state via embeddings rather than external formats.
This allows to cut down decisions mechanisms that otherwise require expensive coordination mechanism.
Something similar will happen within companies IMHO: https://rlupi.com/okr-planning-as-belief-revision
3) Eventually this potentially results in another exponential growth and lock-in mechanism, also at the expense of most tech people as more and more is done outside our interface with AI (i.e. programming and software architecture improvements will it self move below language level, we'll have to reverse engineering increasingly opaque improvements).
4) It ends with the impossibility of AI alignment.
---
I have written a bit about it in the past at the start of the year, when I had a burnout. So, I deleted those confused ramblings. You can stil find it on archive.org: https://web.archive.org/web/20240714153146/https://rlupi.com...