Hacker News new | ask | show | jobs
by chewxy 983 days ago
I note something very interesting in the AI hype, and I would like someone to help explain it.

Whenever there's a news or article noting the limits of current LLM tech (especially the GPT class of models from OpenAI), there's always a comment that says something along the lines of "ah did you test it on GPT-4"?

Or if it's clear that it's the limitation of GPT-4, then you have comments along the lines of "what's the prompt?", or "the prompt is poor". Usually, it's someone who hasn't in the past indicated that they understand that prompt engineering is model specific, and the papers' point is to make a more general claim as opposed to a claim on one model.

Can anyone explain this? It's like the mere mention of LLMs being limited in X, Y, Z fashion offends their lifestyle/core beliefs. Or perhaps it's a weird form of astroturfing. To which, I ask, to what end?

7 comments

> there's always a comment that says something along the lines of "ah did you test it on GPT-4"?

Perhaps because whenever there's "a news or article noting the limits of current LLM tech", it's a bit like someone tried to play a modern game on a machine they found in their parents' basement, and the only appropriate response to this is, "have you tried running it on something other than a potato"? This has been happening so often over the past few months that it's the first red flag you check for.

GPT-4 is still qualitatively ahead of all other LLMs, so outside of articles addressing specialized aspects of different model families, the claims are invalid unless they were tested on GPT-4.

(Half the time the problem is that the author used ChatGPT web app and did not even realize there are two models and they've been using the toy one.)

As someone who has this instinct myself, there is a line of reactionism to modern AI/ML that says, "this is just a toy, look it can't do something simple." But often the case, if _can_ do that thing with a either a more advanced model, or a more built-out system. So the instinct is to try and explain that the pessimism is wrong. That we really can push the boundary and do more, even if it isn't going to work out of the box yet. I react that way against all forms of poppy snipping.
Hyping up tech based on what you think it will be able to do in the future is the misplaced overhyping that is the problem. The issues people say are easy to fix aren't easy to fix.

Expect the model to continue to perform like it does today, and then lots of dumb integrations added to it, and you will get a very accurate prediction of how most of new tech hype turns out. Dumb integrations can't add intelligence, but it can add a lot of value, so the rational hype still sees this as a very valuable and exciting thing, but it isn't a complete revolution in its current form.

The output of any model is essentially random and whether it is useful or impressive is a coin flip. While most people get a mix of heads and tails, there are a few people at any time that are getting streaks of one head after another or vice versa.

So my perception is this leads to people who have good luck and perceive LLMs as near AGI because it arrives at a useful answer more often than not, and these people cannot believe there are others who have bad luck and get worthless output from their LLM, like someone at a roulette table exhorting "have you tried betting it all on black? worked for me!"

1. Just like it's frustrating when a paper is published making claims that are hard to verify, it's frustrating when somebody says "x can't do y" in a way that is hard to verify^^

2. LLMs, in spite of the complaints about the research leaders, are fairly democratic. I have access to several of the best LLMs currently in existence and the ones I can't access haven't been polished for general usage anyway. If you make a claim with a prompt, it's easy for me to verify it

3. I've been linked legitimate ChatGPT prompts where someone gets incorrect data from ChatGPT - my instinct is to help them refine their prompt to get correct data

4. If you make a claim about these cool new tools (not making a claim about what they're good for!) all of these kick in. I want to verify, refine, etc.

Of course some people are on the bandwagon and it is akin to insulting their religion (it is with religious fervor they hold their beliefs!) but at least most folks on hn are just excited and trying to engage

^^ I actually think making this claim is in bad form generally. It's like looking for the existence of aliens on a planet. Absence of evidence is not evidence of absence

If someone comes here and says "<insert programming language> cannot do X" and that is wrong, or perhaps outdated, don't you feel that the reaction would be similar?

If you are trying to make categorical statements about what AI is unable to do, at the very least you should use a state-of-the-art system, which conveniently is easily available for everyone.

Because they're saying it can't do something when they're holding it wrong.

It's a weird thing to get hung up on if you ask me.

Perhaps they are trying to help people get the best out of a tool which they themselves find very useful?