| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kadushka 457 days ago
	AI field desperately needs smarter models - not faster models.

4 comments

geysersam 457 days ago

Definitely needs faster and cheaper models. Fast and cheap models could replace software in tons of situations. Imagine a vending machine or a mobile game or a word processor where basically all logic is implemented as a prompt to an llm. It would serve as the ultimate high level programming language.

janalsncm 457 days ago

I think natural language to code is the right abstraction. Easy enough barrier to entry but still debuggable. Debugging why an LLM randomly gives you Mountain Dew instead of Sprite if you have a southern accent sounds like a nightmare.

geysersam 457 days ago

I'm not sure it would be that hard to debug. Make sure you can reproduce the llm state (by storing the random seed for the session, or something like that) and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"

otabdeveloper4 457 days ago

> and then ask it "why did you just now give that customer mountain dew when they ordered sprite?"

Worse than useless for debugging.

An LLM can't think and doesn't have capabilities for self-reflection.

It will just generate a plausible stream of tokens in reply that may or may not correspond to the real reason why.

geysersam 457 days ago

Of course a llm can't think. But that doesn't mean it can't answer simple questions about the output that was produced. Just try it out with chatgpt when you have time. Even if it's not perfectly accurate it's still useful for debugging.

Just think about it as a human employee. Can they always say why they did what they did? Often, but not always. Sometimes you will have to work to figure out the misunderstanding.

otabdeveloper4 457 days ago

> it's still useful for debugging

How so? What the LLM says is whatever is more likely given the context. It has no relation to the underlying reality whatsoever.

janalsncm 457 days ago

Why not just store the state in the code and debug as usual, perhaps with LLM assistance? At least that’s tractable.

suddenlybananas 457 days ago

Why on earth would you implement a vending machine using an LLM?

K0balt 457 days ago

The same reason we make the butter dish suffer from existential angst.

jacob019 457 days ago

Because it's easy and cheap. Like how many products use a Raspberry Pi or ESP32 when an ATtiny would do.

kadushka 457 days ago

How in the world is this easy and cheap? Are you planning to run this LLM inside the vending machine? Or are you planning to send those prompts to a remote LLM somewhere?

geysersam 457 days ago

The premise here is that the model runs fast and cheap. With the current state of the technology running a vending machine using an LLM is of course absurd. The point is that accuracy is not the only dimension that brings qualitative change to the kind of applications that LLMs are useful for.

kadushka 457 days ago

Running a vending machine using an LLM is absurd not because we can't run LLMs fast or cheap enough - it's because LLMs are not reliable, and we don't know yet how to make them more reliable. Our best LLM - o3 - doubled the previous model (o1) hallucination rate. OpenAI says it hallucinated a wrong answer 33% of the time in benchmarks. Do you want a vending machine that screws up 33% of the time?

Today, the accuracy of LLMs is by far a bigger concern (and a harder problem to solve) than its speed. If someone releases a model which is 10x slower than o3, but is 20% better in terms of accuracy, reliability, or some other metric of its output quality, I'd switch to it in a heartbeat (and I'd be ready to pay more for it). I can't wait until o3-pro is released.

K0balt 457 days ago

You could run a 3B model on 200 dollars worth of hardware and it would do just fine, 100 percent of the time, most of the time. I could definitely see someone talking it out of a free coke now and then though.

With vending machines costing 2-5k, it’s not out of the question, but it’s hard to imagine the business case for it. Maybe the tantalizing possibility of getting a free soda would attract traffic and result in additional sales from frustrated grifters? Idk.

Wazako 457 days ago

Yet deepseek has shown that more dialogue increases quality. Increasing speed is therefore important if you need thinking models.

guiriduro 457 days ago

If you have much more speed in the available time, for an activity like coding, you could use that for iteration, writing more tests and satisfying them, especially if you can pair that with a concurrent test runner to provide feedback. I'm not sure the end result would be lower scoring/smartness than an LLM could achieve in the same duration.

kadushka 457 days ago

I'm not sure the end result would be lower scoring/smartness than an LLM could achieve in the same duration.

It probably wouldn’t with current models. That’s exactly why I said we need smarter models - not more speed. Unless you want to “use that for iteration, writing more tests and satisfying them, especially if you can pair that with a concurrent test runner to provide feedback.” - I personally don’t.

otabdeveloper4 457 days ago

LLM's can't think, so "smarter" is not possible.

IshKebab 457 days ago

They can by the normal English definitions of "think" and "smart". You're just redefining those words to exclude AI because you feel threatened by it. It's tedious.

otabdeveloper4 457 days ago

Incorrect. LLM's have no self-reflection capability. That's a key prerequisite for "thinking". ("I think, therefore I am.")

They are simple calculators that answer with whatever tokens are most likely given the context. If you want reasonable or correct answers (rather than the most likely) then you're out of luck.

IshKebab 457 days ago

It is not a key prerequisite for "thinking". It's "I think therefore I am" not "I am self-aware therefore I think".

In the 90s if your cursor turned into an hourglass and someone said "it's thinking" would you have pedantically said "NO! It is merely calculating!"

Maybe you would... but normal people with normal English would not.

otabdeveloper4 456 days ago

Self-reflection is not the same thing as self-awareness.

Computers have self-reflection to a degree - e.g., they react to malfunctions and can evaluate their own behavior. LLMs can't do this, in this respect they are even less of a thinking machine than plain old dumb software.

baq 457 days ago

Technically correct and completely besides the point.

K0balt 457 days ago

People cant fly.