Hacker News new | ask | show | jobs
by TobTobXX 66 days ago
> Muse Spark is a natively multimodal reasoning model with support for [...] visual chain of thought [...].

Do they mean "the chain of thought is visible to the user" (ie. not hidden like ChatGPT), or "the medium of the chain of thought is not text, but visuals" (ie. thinking in images).

I'd guess the former, since it wouldn't be economical to generate transient images, just for thinking. But I'm not sure why they'd highight that in that case. If it were the second thing, that'd be extremely interesting. The first model not to think in text.

2 comments

Perhaps more importantly, will their chain of thought be "real"? So far the ones I've seen seem to be elaborate fakery. They look good unless you dig in at which point you often find that it merely looks plausible on the surface but that something else is going on under the hood.
I don't know what you mean by that. We know what's going on under the hood always: linear algebra, the attention mechanism etc.

To my first approximation all "Chain of thought" means is that instead of having to prompt the model to discuss everything in text and then decide at the end[1], now it sort of automatically does that so you don't need to prompt it.

[1] Which used to bring about very substantial improvements in performance on some tasks

I think it was clear from context that "under the hood" wasn't referring to the math but rather to the contents of the trace. What's written (often?) isn't what's actually being "thought" about. The trace is a trained output similar to the final output, which is to say that it's fake. There are research papers on the topic, particularly that models can be trained to print other arbitrary stuff during the "thinking" phase instead.

You can easily see this for yourself by carefully walking through a given trace with a critical eye. Here's an example from myself a few days ago. https://news.ycombinator.com/item?id=47623324

Yeah now I get what you're saying. Yes the trace isn't what's actually happening. What's actually happening is just the attention mechanism etc. The model doesn't "think" in human language, it thinks in linear algebra. The thing is that before chain of thought it used to be necessary to get the model to output some language because that's the only thing it had to attach processing to (so if you wanted more processing you needed to get it to generate more text). Whereas now we get the model to generate some text that is a simulcrum on the thought that it might hypothetically be doing but in actual practise chain of thought is just something they get the model to do by training it in a certain way.
Actually I believe that behavior shows up in Gemini chats (if you are doing a visual task) it will generate intermediate diagrams and research papers have created approaches to that effect (generating turtle diagrams) since 2024