| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jkelleyrtp 575 days ago

I think this might be the “it” moment for AI/LLMs. I was hiking with a friend recently and we talked about this at length.

The arc-AGI results from O3 are apparently a result of chain of thought given enough time to explore a solution space. Reasoning might be simply a higher dimensional form of rubix cube solving. BFS, search, back-tracking, etc. It seems unlikely that humans think in “tokens” so why do LLMs?

By staying in latent space, the models are free to describe an “idea” in higher resolution than what language allows. English is coarse, granular. Latent space is a much finer representation of ideas and their interplay.

Latent space is also much cheaper to execute in. The model can think without the language encoding/decoding step. This lets it branch out hundreds of ideas and explore only the most useful ones in a fraction of time that reasoning “out-loud” would take.

The states also don’t need to be tied to language. Feed in a robot’s state, time series data, or any abstract data. Reason in category theory or linear algebra or complex analysis. Humans are hard wired for one set of math - an abstract latent space can represent anything.

I’m a bit disappointed OpenAI didn’t stumble on this first. I’ve been skeptical of LLMs since their big debut last year. LLMs seem like a great way of solving language, but reasoning is much more complex. Once you grok the math behind the current models, you immediately question why the encoding/decoding step is there. Diffusion models are incredible but it felt that LLMs lacked the same creativity. Encoding/decoding forces a token-based discretization and therefore a loss of complexity.

With the byte-latent paper it was quite clear we’d see this paper. This truly might be the “it” moment.

6 comments

rlupi 575 days ago

IMHO The problem (for us) with this approach are the logical consequences:

1) if AI large model become more powerful avoiding language, embeddings of AI state become even more tied to the model they originate than now

Consequence: AI progress stalls, as AI user companies need to invest increasing amount of money to reindex their growing corpuses.

This is already a problem, it becomes more of a lock-in mechanism.

If this is overcome...

2) Embeddings become a viral mechanism: it makes sense for a large company that commands a market to impose to its suppliers to use the same AI models, because they can transfer state via embeddings rather than external formats.

This allows to cut down decisions mechanisms that otherwise require expensive coordination mechanism.

Something similar will happen within companies IMHO: https://rlupi.com/okr-planning-as-belief-revision

3) Eventually this potentially results in another exponential growth and lock-in mechanism, also at the expense of most tech people as more and more is done outside our interface with AI (i.e. programming and software architecture improvements will it self move below language level, we'll have to reverse engineering increasingly opaque improvements).

4) It ends with the impossibility of AI alignment.

---

I have written a bit about it in the past at the start of the year, when I had a burnout. So, I deleted those confused ramblings. You can stil find it on archive.org: https://web.archive.org/web/20240714153146/https://rlupi.com...

link

otikik 575 days ago

> It seems unlikely that humans think in “tokens” so why do LLMs?

I can think of one reason: scrutability. It’s going to be even harder to understand how a response gets produced if there isn’t even a text-based representation to help the human understand

link

IshKebab 575 days ago

I think we're already way beyond the point where anyone really understands how a response is produced, even without this.

link

anon373839 575 days ago

Indeed. Even if an LLM tells you its “reasoning” process step by step, it’s not actually an exposition of the model’s internal decision process. It’s just more text that, when generated, improves the chances of a good final output.

link

nfw2 575 days ago

the token generation part isn't well understood, but the output "chain-of-thought" used to produce the final answer can be scrutinized for correctness with a traditional CoT model (although this would require model providers to not hide reasoning tokens)

link

pigpop 575 days ago

you can save the hidden states and convert them into a more interpretable format. it's still recorded and you could make modifications at different steps to see how that would change the conclusion.

link

layer8 575 days ago

IMO we won’t have the “it” moment until we have continuous learning (training) in some fashion.

link

mattxxx 574 days ago

^ This and we need to be continually learning on an energy budget similar to how much a human spends per hour.

link

rlupi 574 days ago

The main reason why we can't do that now is because we require models to be digitally reproducible (IMHO, but also read Geoffrey Hinton's mortal computing).

The energy cost come from error correction as much as training algorithms.

link

jokethrowaway 574 days ago

This sounds like brute forcing a solution to make up for lack of intelligence.

In an IQ test, like the one in the arc agi test, a human sees the pattern instantly and effortlessly. o3 tries N paths until it stumbles on the right one and assess that there is a pattern.

I think we need a radically different architecture, this is a gimmick.

link

pigpop 575 days ago

I think this is a step in the right direction but not the end. it takes the sampler out of the equation during most of the reasoning process but it is still important for the "show your work" aspects of reasoning or solving a problem. balancing when to think against when to write down or commit to certain thoughts is important. there are many more pieces to the puzzle.

link

JambalayaJimbo 574 days ago

What does latent space here mean?

link