Hacker News new | ask | show | jobs
by onion2k 742 days ago
LLMs are not accurate, they aren't subject matter experts that'll be maybe within 5% error margin.

You're asserting that the AI features will be removed in 3 to 5 years because they're not accurate enough today, but you actually need them to remain inaccurate in 3 years time for your prediction to be correct.

That seems unlikely. I agree that people will start to realize the cost, but the accuracy will improve, so people might be willing to pay.

3 comments

The same argument can be used for Tesla full self driving: basically it has to be (nearly) perfect, and after years of development, it's not there yet. What's different about LLMs?
They don't have to be perfect to be useful, and death isn't the price of being wrong.
Death actually can be the price of being wrong. Just wait for someone to do the wrong thing with an AI tool they weren't supposed to use for what they were doing, and the AI to spit out the worse possible "hallucination" (in terms of outcome).
What you say is true, however with self-driving cars death, personal injury, and property damage are much more immediate, much more visible, and many of the errors are of a kind where most people are qualified to immediately understand what the machine did wrong.

An LLM that gives you a detailed plan for removing a stubborn stain in your toilet that involves mixing the wrong combination of drain cleaners and accidentally releasing chlorine, is going to happen if it hasn't already, but a lot of people will read about this and go "oh, I didn't know you could gas yourself like that" and then continue to ask the same model for recipes or Norwegian wedding poetry because "what could possibly go wrong?"

And if you wonder how anyone can possibly read about such a story and react that way, remember that Yann LeCun says this kind of thing despite (a) working for Facebook and (b) Facebook's algorithm gets flack not only for the current teen depression epidemic, but also from the UN for not doing enough to stop the (ongoing) genocide in Myanmar.

It's a cognitive blind spot of some kind. Plenty smart, still can't recognise the connection.

Google’s recent AI assistant has already been documented recommending people mix bleach and white vinegar for cleaning purposes.

Someone’s going to accidentally kill themselves based on an AI hallucination soon if no one has already.

There's hundreds+ of companies making LLMs we can choose from, and the switching cost is low. There's only one company that can make self-driving software for Tesla. Basically, competition should lead to improvements.
Tesla aren't the only people trying to make self-driving cars, famously Uber tried and Waymo looks like they're slowly succeeding. Competition can be useful, but it's not a panacea.
Mercedes seems to be eating Tesla’s breakfast on FSD, in particular where safety and real-world implementation is concerned. Their self-driving vehicles are equipped with aqua-colored lights to alert other drivers that it is being controlled via computer, and Mercedes has chosen to honor its liability for incidents/accidents.
In Europe yes, especially with the Level 3, it means that Mercedes is taking the liability.

In the US it's different, because the US' FSD has nothing to do with the capabilities of the FSD in Europe (which is some sort of glorified driver assist), and it can clear navigate in many streets.

Mercedes in the US is very limited.

GPT-4 is 1 year old; 3.5 is 1 and a half. Before 3.5, this wasn't really a useful technology. 7 years ago it was a research project that Google saw no value in pursuing.
Anyone claiming that accuracy of AI models WILL improve is either unaware of how they really work or is a snake oil salesman.

Forget about a model that knows EVERYTHING. Let's just train a model that only is expert in not all the law of United states just one state and not even that, just understands FULLY the tax law of just one state to the extent that whatever documents you throw at it, it beats a tax consultancy firm every single time.

If even that were possible, OpenAI et.el would be playing this game differently.

Why does a mobile app needs to beat a highly trained professional every single time in order to be useful?

Is this standard applied to any other app?

Those use cases are never sold as "Mobile apps", but rather as "enterprise solutions", that cost the equivalent of several employees.

An employee can be held accountable, and fired easily. An AI? You'll have to talk to the Account Manager, and sit through their attempts to 'retain' you.

Because it's taxation. Financial well being is at stack. We're even looking at a potential jail time for tax fraud, tax evasion and what not.

My app is powered by GTPChatChat, the model beating all artificially curated benchmarks.

Still wanna buy?

This is one of those "perfect is the enemy of good" situations. Sure, for things where you have a legal responsibility to get things perfectly right using an LLM as the full solution is probably a bad idea (although lots of accountants are using them to speed up processes already, they just check outputs). That isn't the case for 99% of task though. Something that's mostly accurate is good. People are happy with that, and they will buy it.
My experience suggests that LLMs become not less accurate, but less helpful.

Two years ago they output a solution for my query [1] right away, now they try to engage user to implement that thing. This is across the board, as far as I can see.

These LLMs are not about helping anyone, their goals are engagement and mining data for that engagement.

[1] The query is "implement blocked clause decomposition in haskell." There are papers (circa 2010-2012), there are implementations, but not in Haskell. BCD, itself, is easy, and can be expressed in a dozen-two lines of Haskell code.

> These LLMs are not about helping anyone, their goals are engagement and mining data for that engagement.

Wow, this is a really interesting idea! A sneaky play for LLM providers is to be helpful enough to still be used, but also sufficiently unhelpful that your users give you additional training data.

This is obvious in retrospect - instead of making LLMs work better, LLM's handlers invented various techniques to make LLMs to look like they work better, one such example is summarization. Next gen LLMs then get trained on that data.

Now instead of having some answer right away, the user has to engage in discussion, which increases the cost that is sunk into the work with LLMs.