| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by badloginagain 1149 days ago

If I understand correctly, the meat of the argument is "that is a system for every (∀) task, there exists (∃) a setting that gives the correct answer for that one task."

My understanding of this (correct me if I'm wrong) is that the scam is convincing users that GPT-X can do anything with say, the correct prompts.

This argument misses the mark for me. It's not that it solves all the problems, it's that the problems it does solve is economically impactful. Significantly economically impactful in some cases- obvious examples of call centers and first-line customer support.

5 comments

debaserab2 1149 days ago

> Significantly economically impactful in some cases- obvious examples of call centers and first-line customer support.

Is it that obvious?

Yesterday I had a trivial but uncommon issue with my pharmacy. I reached out to them online - their chatbot was the only channel available. I tried, over the course of 20 minutes and 3 restarted sessions, to communicate an issue that a human would have been able to respond to in 30 seconds. Eventually I just gave up and got the prescription filled elsewhere.

No doubt this pharmacy saved money by cutting support staff. I just think it's easy to see these solutions and cost savings without bothering to look at how much of a frustrating experience it can be for a customer.

bombcar 1149 days ago

It is very easy to measure costs associated with a customer.

It is nearly impossible to measure the customers lost.

And you may never return, which they’ll never know.

pixl97 1149 days ago

Unless all pharmacies go to automation, in which you're screwed.

And don't think it can't happen. Consolidation in to just a few companies us happening in huge numbers of industries.

bombcar 1149 days ago

That’s the killer, and it gets bad really fast once an industry “decides” on something. And people simply fall through the cracks.

tanseydavid 1149 days ago

Do you have any reason to believe that the Chatbot was GPT3.5 or GPT4 based?

Waterluvian 1149 days ago

I’d be surprised if something like a pharmacy managed to adopt a tech that quickly. In my experience non-tech industries often take quite a while to adopt.

martyvis 1149 days ago

I have seen plenty of chatbots used by my IT company and 3rd party suppliers I deal with. They really just turn what used to be phone tree to something text based. Pretty basic keyword search and recipes from my experience - that I usually like to escape to a human as soon as I can. I welcome a proper conversational AI chatbot that actually gets stuff done.

hartator 1149 days ago

How are you gonna answer things like pricing? Issue with pharmacies is the super complicated and super secretive pricing structure. A good UI can solve this if they want to drop the secrecy.

debaserab2 1148 days ago

I don't think so, I assume it's some more dated ML approach -- my point is moreso that it's not obvious that it's a good solution for this problem yet.

yowzadave 1149 days ago

Seems unlikely they would be trusting a system that so willingly makes up stuff to provide a customer with life-critical advice about medications

tobyjsullivan 1149 days ago

The argument is more nuanced. Importantly, the article is not making a judgement on the value of GPT, generally (at least not explicitly). It is arguing against one specific narrative.

That narrative goes like this:

Step 1: give me a specific task T from a class of tasks C.

Step 2: I’ll show you that I can formulate a prompt to solve task T.

Step 3: therefore, if you engineer prompts well enough, you can find a single prompt that will solve any task in class C.

The argument is that Step 3 is a non sequitur and that’s the scam.

The question isn’t “is GPT useful” so much as “can products be built on top of GPT?”

crdrost 1149 days ago

More formally, the system specifies a set S of predicates that it can evaluate based on configurations (prompts etc) applied to the general method.

You have a particular predicate p that you want to evaluate, in some larger space of “appropriate” predicates P, then usually machine learning gives you some claim that,

    ∀p in P. ∃s in S. p(x) → s(x)

Call this last bit “weak prediction,” p is not easily computable or else you would not use machine learning but machine learning can compute any s in S and if we find this s then we can use s(x) as evidence that p(x) by Bayes theorem or so. Machine learning has never affirmed strong prediction, in particular there have always been ways to maliciously modify; you look at the details of the machine s, you have s(x,y) “I classify an x as a y” and s(x + Δx, y'), “I classify an x + Δx as a completely different y',” where humans literally cannot tell the difference between x and x + Δx, the alterations are in the “noise” of the data.

The “scam” is that you then tell people to hand calculate a subset X ⊆ p, so p(x) for any x in it, and then sell this as,

    ∃s in S. ∀x in X. s(x)

What's the problem? I think the claim hints that there are a few:

1. Selection bias in claimed accuracy. You generate candidates s¹, s², s³, ... and analyze their accuracy over X to pick one, say the one that gets 96% of X right. The accuracy of the selected solution sⁿ is not 96%, that's lying to yourself. The proper way to use this is to partition X into X¹ + X², use data X¹ to select sⁿ, then evaluate its actual accuracy on X². (This is how I originally read the article.) In particular there is a loss of p(x) from the right hand side of the new expression, suggesting “alignment” lapses, e.g. the machine learning algorithm that appeared to “learn” to identify tanks but actually was identifying clouds in the sky.

2. Selection bias in terms of problems solved, researchers generate problems p¹, p², p³ ... in P and we hear about pⁿ having solution sⁿ with 99% accuracy, this gives us an unrepresentative idea of what the success rates look like in P overall. (This is how I read your take, and meshes better with the final comments in the post.)

3. This broader context looks suspiciously like a problem where you can drive the false positive rate arbitrarily low by raising the false negative rate arbitrarily high, which roughly might explain the tendency of ChatGPT et al. to hallucinate.

alexvoda 1147 days ago

This reminds me about issues raised by Sabine Hossenfelder with regards to grand unified theories or with theories of everything or string theory.

https://www.youtube.com/watch?v=lu4mH3Hmw2o

https://www.youtube.com/watch?v=mdu9KvLxHFg

jameshart 1149 days ago

Right

#1 it’s not clear that this ∃ ∀ construction is a fair representation of what is being ‘sold’ by GPT-x

#2 it’s also not clear what this proposed inverted formulation (∀ ∃) that describes what the author thinks GPT actually is even means. For every setting there exists a task that it answers? Does that even make sense?

justeleblanc 1149 days ago

Pretty sure you should read "for every task there exists a setting".

jameshart 1149 days ago

But what's the inverse?

317070 1149 days ago

there exists a setting which will work for all your test points.

The caveat of the author is (I think) that if you have a task, you collect a set of points (questions) on which you will test this task. Then you tune your setting (prompt) to start working for your test point (questions).

After that procedure, you do not know if that prompt solves the original task. You might have overfitted to your test points.

And by repeatedly doing this overfitting for various tasks, you are not gathering evidence that a good setting truly exists for all tasks

loa_in_ 1149 days ago

You can go as far as claiming that it's true for your colleagues as well! They've solved their piece of the job so far, but what evidence do you have that they will keep solving them in future? It's all just speculation!

anonymous_sorry 1149 days ago

Informally, "this thing can solve all your problems" vs "for each of your problems there is a thing that can solve it".

I suppose the argument is that LLMs are not a solution to any problem, they are a complex tool which might be used to find a solution, with non-zero effort.

As an example of non-zero effort: I spent a fair amount of time the other day trying to get chatGPT to advise how to effectively deal with a grey squirrel problem. It was more interested in telling me that squirrels should be treated humanely, to the extent that it suggested doing things that are illegal in my country (releasing a captured grey squirrel). I asked why and it told me all animals had a right to dignity and respect. I couldn't resist getting side tracked by this. After some light trolling I asked it about how it had come to hold these values and it told me that as an LLM it didn't have values, but then restated its position anyway.

In the end I got some more sense out of it with a new prompt where I specifically said I was interested in effective, legal methods of control without any moralising.

If you're concerned, I haven't killed any squirrels and almost certainly won't.

jameshart 1149 days ago

Right, but this seems like strawmanning to me. The vast majority of useful technology ever developed has been "a complex tool which might be used to find a solution, with non-zero effort".

The complaint here seems to be about the existence of marketing.

mitchellh 1149 days ago

Since my blog post is linked, I wanted to clarify something. While this appears to be the broad message, I don't think the author intended to imply this about my post specifically, but I still feel the need that I point out the following in my prompt eng blog post (linked by the OP)[1]:

> To start, you must have a problem you're trying to build a solution for. The problem can be used to assess whether prompting is the best solution or if alternative approaches exist that may be better suited as a solution. Engineering starts with not using a method for the method's sake, but driven by the belief that it is the right method.

[1]: https://mitchellh.com/writing/prompt-engineering-vs-blind-pr...

cglong 1149 days ago

I ended up, ironically, asking ChatGPT for a summary of the article. That's the argument it derived too.