| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by TobTobXX 113 days ago

> Muse Spark is a natively multimodal reasoning model with support for [...] visual chain of thought [...].

Do they mean "the chain of thought is visible to the user" (ie. not hidden like ChatGPT), or "the medium of the chain of thought is not text, but visuals" (ie. thinking in images).

I'd guess the former, since it wouldn't be economical to generate transient images, just for thinking. But I'm not sure why they'd highight that in that case. If it were the second thing, that'd be extremely interesting. The first model not to think in text.

2 comments

fc417fc802 113 days ago

Perhaps more importantly, will their chain of thought be "real"? So far the ones I've seen seem to be elaborate fakery. They look good unless you dig in at which point you often find that it merely looks plausible on the surface but that something else is going on under the hood.

link

seanhunter 113 days ago

I don't know what you mean by that. We know what's going on under the hood always: linear algebra, the attention mechanism etc.

To my first approximation all "Chain of thought" means is that instead of having to prompt the model to discuss everything in text and then decide at the end[1], now it sort of automatically does that so you don't need to prompt it.

[1] Which used to bring about very substantial improvements in performance on some tasks

link

fc417fc802 112 days ago

I think it was clear from context that "under the hood" wasn't referring to the math but rather to the contents of the trace. What's written (often?) isn't what's actually being "thought" about. The trace is a trained output similar to the final output, which is to say that it's fake. There are research papers on the topic, particularly that models can be trained to print other arbitrary stuff during the "thinking" phase instead.

You can easily see this for yourself by carefully walking through a given trace with a critical eye. Here's an example from myself a few days ago. https://news.ycombinator.com/item?id=47623324

link

seanhunter 112 days ago

Yeah now I get what you're saying. Yes the trace isn't what's actually happening. What's actually happening is just the attention mechanism etc. The model doesn't "think" in human language, it thinks in linear algebra. The thing is that before chain of thought it used to be necessary to get the model to output some language because that's the only thing it had to attach processing to (so if you wanted more processing you needed to get it to generate more text). Whereas now we get the model to generate some text that is a simulcrum on the thought that it might hypothetically be doing but in actual practise chain of thought is just something they get the model to do by training it in a certain way.

link

rain-princess 113 days ago

Actually I believe that behavior shows up in Gemini chats (if you are doing a visual task) it will generate intermediate diagrams and research papers have created approaches to that effect (generating turtle diagrams) since 2024

link