Hacker News new | ask | show | jobs
by kadushka 411 days ago
AI field desperately needs smarter models - not faster models.
4 comments

Definitely needs faster and cheaper models. Fast and cheap models could replace software in tons of situations. Imagine a vending machine or a mobile game or a word processor where basically all logic is implemented as a prompt to an llm. It would serve as the ultimate high level programming language.
I think natural language to code is the right abstraction. Easy enough barrier to entry but still debuggable. Debugging why an LLM randomly gives you Mountain Dew instead of Sprite if you have a southern accent sounds like a nightmare.
I'm not sure it would be that hard to debug. Make sure you can reproduce the llm state (by storing the random seed for the session, or something like that) and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"
> and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"

Worse than useless for debugging.

An LLM can't think and doesn't have capabilities for self-reflection.

It will just generate a plausible stream of tokens in reply that may or may not correspond to the real reason why.

Of course a llm can't think. But that doesn't mean it can't answer simple questions about the output that was produced. Just try it out with chatgpt when you have time. Even if it's not perfectly accurate it's still useful for debugging.

Just think about it as a human employee. Can they always say why they did what they did? Often, but not always. Sometimes you will have to work to figure out the misunderstanding.

> it's still useful for debugging

How so? What the LLM says is whatever is more likely given the context. It has no relation to the underlying reality whatsoever.

Why not just store the state in the code and debug as usual, perhaps with LLM assistance? At least that’s tractable.
Why on earth would you implement a vending machine using an LLM?
The same reason we make the butter dish suffer from existential angst.
Because it's easy and cheap. Like how many products use a Raspberry Pi or ESP32 when an ATtiny would do.
How in the world is this easy and cheap? Are you planning to run this LLM inside the vending machine? Or are you planning to send those prompts to a remote LLM somewhere?
The premise here is that the model runs fast and cheap. With the current state of the technology running a vending machine using an LLM is of course absurd. The point is that accuracy is not the only dimension that brings qualitative change to the kind of applications that LLMs are useful for.
Running a vending machine using an LLM is absurd not because we can't run LLMs fast or cheap enough - it's because LLMs are not reliable, and we don't know yet how to make them more reliable. Our best LLM - o3 - doubled the previous model (o1) hallucination rate. OpenAI says it hallucinated a wrong answer 33% of the time in benchmarks. Do you want a vending machine that screws up 33% of the time?

Today, the accuracy of LLMs is by far a bigger concern (and a harder problem to solve) than its speed. If someone releases a model which is 10x slower than o3, but is 20% better in terms of accuracy, reliability, or some other metric of its output quality, I'd switch to it in a heartbeat (and I'd be ready to pay more for it). I can't wait until o3-pro is released.

You could run a 3B model on 200 dollars worth of hardware and it would do just fine, 100 percent of the time, most of the time. I could definitely see someone talking it out of a free coke now and then though.

With vending machines costing 2-5k, it’s not out of the question, but it’s hard to imagine the business case for it. Maybe the tantalizing possibility of getting a free soda would attract traffic and result in additional sales from frustrated grifters? Idk.

Yet deepseek has shown that more dialogue increases quality. Increasing speed is therefore important if you need thinking models.
If you have much more speed in the available time, for an activity like coding, you could use that for iteration, writing more tests and satisfying them, especially if you can pair that with a concurrent test runner to provide feedback. I'm not sure the end result would be lower scoring/smartness than an LLM could achieve in the same duration.
I'm not sure the end result would be lower scoring/smartness than an LLM could achieve in the same duration.

It probably wouldn’t with current models. That’s exactly why I said we need smarter models - not more speed. Unless you want to “use that for iteration, writing more tests and satisfying them, especially if you can pair that with a concurrent test runner to provide feedback.” - I personally don’t.

LLM's can't think, so "smarter" is not possible.
They can by the normal English definitions of "think" and "smart". You're just redefining those words to exclude AI because you feel threatened by it. It's tedious.
Incorrect. LLM's have no self-reflection capability. That's a key prerequisite for "thinking". ("I think, therefore I am.")

They are simple calculators that answer with whatever tokens are most likely given the context. If you want reasonable or correct answers (rather than the most likely) then you're out of luck.

It is not a key prerequisite for "thinking". It's "I think therefore I am" not "I am self-aware therefore I think".

In the 90s if your cursor turned into an hourglass and someone said "it's thinking" would you have pedantically said "NO! It is merely calculating!"

Maybe you would... but normal people with normal English would not.

Self-reflection is not the same thing as self-awareness.

Computers have self-reflection to a degree - e.g., they react to malfunctions and can evaluate their own behavior. LLMs can't do this, in this respect they are even less of a thinking machine than plain old dumb software.

Technically correct and completely besides the point.
People cant fly.