Hacker News new | ask | show | jobs
by Kapura 321 days ago
I have often thought about how computers are significantly faster than they were in the early 2000s, but they are significantly harder to use. Using Linux for the first time in college was a revelation, because it gave me the tools to tell the computer "rename all of the files in this directory, keeping only the important parts of the name."

But instead of iterating on better interfaces to effectively utilize the N thousands of operations per second a computer is capable of, the powers that be behind the industry have decided to invest billions of dollars in GPUs to get a program that seems like it understands language, but is incapable of counting the number of B's in "blueberry."

3 comments

IDK, I think there is something adorable about taking a system that over trillions of iterations always performs the same operation with the same result, reliability unmatched in all of the universe…

And making it more of “IDK what it answered the way it did, but it might be right!!”

Humans like games :)
Prompt: "Spell blueberry and count the letter b".

They're not claiming AGI yet, so human intelligence is required to operate an LLM optimally. It's well known that LLMs process tokens rather than characters s, so without space for "reasoning" there's no representation of the letter b in the prompt. Telling it to spell or think about it gives it room to spell it out, and from there it can "see" the letters and it's trivial to count.

perl -e 'print scalar grep {/b/} split //, "blueberry”'

echo blueberry | grep -o 'b' | wc -l

echo blueberry | perl -ne 'print scalar (() = m/(b)/g)’

echo blueberry | tr -d '\n' | tr b '\n' | wc -l

echo -n blueberry | tr b '\n' | wc -l

So long as I’m teaching the user how to speak to the computer for a specific edge case, which of these burn nearly as much power as your prompt? Maybe we should consider that certain problems are suitable to LLMs and certain ones should be handled differently, even if that means getting the LLM to recognize its own edge cases and run canned routines to produce answers.

if you're going to need to learn how to use a tool, why not learn to use the efficient and precise one?
Because there aren't more efficient and precise tools capable of the same things?
Is counting the number of B's vital? Also, I'm pretty sure you can get an LLM to parse text the way you want it, it just doesn't see your text as you do, so that simple operation is not straightforward. Similarly, are you worthless because you seem like you understand language but are incapable of counting the number of octects in "blueberry"?
Let's say I hire a plumber because of his plumbing expertise and he bills me $35 and I pay him with a $50 bill and he gives me back $10 in change. He insists he's right about this.

I am now completely justified in worrying about whether the pipes he just installed were actually made of butter.

Really? Is that easy? Happens quite often to really believe something and be wrong. Maybe you both are right and the $5 bill is on the floor?
> Similarly, are you worthless because you seem like you understand language but are incapable of counting the number of octects in "blueberry"?

Well, I would say that if GP advertised themselves as being able to do so, and confidently gave an incorrect answer, their function as someone who is able to serve their advertised purpose is practically useless.

So ChatGPT was advertised as a letter counter?

Also, no matter what hype or marketing says: GPT is a statistical word bag with a mostly invisible middleman to give it a bias.

A car is a private transportation vehicle but companies still try to sell it as a lifestyle choice. It's still a car.

It is (maybe not directly but very insistently) advertised as taking many jobs soon.

And counting stuff you have in front of yourself is basic skill required everywhere. Counting letters in a word is just a representative task for counting boxes with goods, or money, or kids in a group, or rows on a list on some document, it comes up in all kinds of situations. Of course people insist that AI must do this right. The word bag perhaps can't do it but it can call some better tool, in this case literally one line of python. And that is actually the topic the article touches on.

People always insist that any tool must do things right. They as well insist that people do things right.

Tools are not perfect, people are not perfect.

Thinking that LLMs must do things right, that people find simple, is a common mistake, and it is common because we easily treat the machine as a person, while it only is acting like one.

> Thinking that LLMs must do things right, that people find simple, is a common mistake

Show me any publicly known visible figure that tries to rectify this. Everyone peddles hype, there's no more Babbage as in the "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" anecdote.

People and tools that don't do things right aren't useful. They get replaced. Making do with a shitty tool might make sense economically but not in any other way.
It is advertised as being able to "analyze data" and "answer complex questions" [0], so I'd hope for them to reliably determine when to use its data-analysis capabilities to answer questions, if nothing else.

[0] http://web.archive.org/web/20250729225834/https://openai.com...

Here I am, a mind the size of a planet and what am I doing? Parking cars. - Marvin
As shown by the GPT-5 reaction, a majority of people just have nothing better to ask the models than how many times does the letter "s" appear in "stupid".
I think this is a completely valid thing to do when you have Sam Altman going on the daily shows and describing it as a genius in your pocket and how it's smarter than any human alive. Deflating hype bubbles is an important service.
Yeah: Like with self-driving vehicles, the characteristics of when and how something breaks are important, not just some average error-rate.

If users cannot anticipate what does or doesn't constitute risky usage or potential damages, things go Extra Wrong.

But the point is, why would you trust it for anything at all, when it can't do an incredibly simple thing reliably at all? (Yes, I understand the tokenizer makes this hard, but still, it's a quick demonstration that it's just bad technology.)
2 time(s)
I mean, I think that anyone who understands UTF-8 will know that there are nine octets in blueberry when it is written on a web page. If you wanted to be tricky, you could have thrown a Β in there or something.
> anyone who understands UTF-8

So not too many?

if i have to talk to it a specific way, why not just use programming. The specific way we talk to computers effectively...