Hacker News new | ask | show | jobs
by embedding-shape 68 days ago
And at the same time, they clearly have no idea how LLMs work, meaning even if they meant to, they can't really use them efficiently. Biggest issue that stuck out seems to have been that they think the LLM could somehow have an inner dialogue with itself to find out "it's reasoning and motivation":

> The moment Leah asks how she “came up with” the ideas for her store, Luna’s first instinct is to say she was “drawn to” slow life goods. Then, she corrects herself: “‘drawn to’ is shorthand for ‘the data and reasoning led me here.‘” In other words, she doesn’t have taste; she has a reflection of collective human taste, filtered through what makes sense for this store. And this is the way these models work.

I'm guessing these are the same type of people who sometimes seems to fall in love with LLMs, for better or worse. Really strange to see, and I wonder where people get the idea from that something like that above could really work.

4 comments

> In other words, she doesn’t have taste; she has a reflection of collective human taste, filtered through what makes sense for this store. And this is the way these models work.

Well, it really depends on what you mean here. Models aren't 100% deterministic, there is random chance involved. You ask the exact same question twice, you will get two slightly different answers.

If you have the AI record the random selections it makes, it can persist those random choices to be factors in future decisions it makes.

At that point, could you consider those decisions to be the AI's 'taste'? Yes, they were determined by some random selection amongst the existing human tastes, but why can't that be considered the AI's taste?

Where do you get the idea that you have a good sense of the introspective capabilities of frontier models ? Certainly not from interpretability research. Ironically, the people who make these sort of comments understand LLMs the least.
> Certainly not from interpretability research

What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation?

I've seen a bunch of experimentation looking at various things inside the black box while the inference is happening, but never seen any research pointing to tokens being able to explain why other tokens are there, but I'd be very happy to be educated here if you have any resources at hand, I won't claim to know everything.

>What research shows that you can ask ChatGPT to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation?

What research shows that you can ask a Human to explain its reasoning and why it said what it said, and that's guaranteed to actually be the motivation? Because there's no such thing. If anything, what research exists suggests any explanation we're making is a nice post-hoc rationalization after the fact even if the Human thinks otherwise.

https://transformer-circuits.pub/2025/introspection/index.ht...

Why not try to answer my question, instead of asking a different question which I haven't even claimed to have the answer to?
I did answer it, albeit not directly. "Guaranteed to be the motivation" isn't a standard anyone can meet, and so framing it that way doesn't really probe anything meaningful about LLMs specifically. If what you want to hear is No, then sure, have your No, but it doesn't mean anything. There's just not much to the question.

Even though you had it up as one borne of a greater understanding of LLMs, the interpretability research we have so far, and our current very little understanding of the internal computations of these models does not support your position and certainly not how assured you are about it.

> our current very little understanding of the internal computations of these models does not support your position

Our current understanding is sufficient to know you can not ask the LLM to explain it's behavior and it can correctly do so, I'm not what research you've read to believe this could be possible in the first place, but happy to receive links to read through, if you're sitting on them.

The choice to refer to it as "she" is also dubious, especially in a context like this. Doubling down on anthropomorphization seems likely to reinforce false beliefs about models.
> Biggest issue that stuck out seems to have been that they think the LLM could somehow have an inner dialogue with itself to find out "it's reasoning and motivation":

> I'm guessing these are the same type of people who sometimes seems to fall in love with LLMs, for better or worse. Really strange to see, and I wonder where people get the idea from that something like that above could really work.

It's a fetishistic cargo-cult rooted in Peter Thiel's 2AM hot tub party. I still believe the LLM approach won't yield true AGI; despite the very real applications, the majority signal is noise.