Hacker News new | ask | show | jobs
by elwell 716 days ago
> All these LLMs make up too much stuff, I don't see how that can be fixed.

All these humans make up too much stuff, I don't see how that can be fixed.

6 comments

I know you’re trying to be edgy here, but if I was deciding between searching online and finding a source vs trying to shortcut and use GPT, but GPT decides to hallucinate and make something up - that’s the deceiving part.

The biggest issue is how confidently wrong GPT enjoys being. You can press GPT in either right or wrong direction and it will concede with minimal effort, which is also an issue. It’s just really bad russian roulette nerdspining until someone gets tired.

I wouldn't call it deceiving. In order to be motivated to deceive someone, you'd need agency and some benefit out of it
1. Deception describes a result, not a motivation. If someone has been led to believe something that isn't true, they have been deceived, and this doesn't require any other agents

2. While I agree that it's a stretch to call ChatGPT agentic, it's nonetheless "motivated" in the sense that it's learned based on an objective function, which we can model as a causal factor behind its behavior, which might improve our understanding of that behavior. I think it's relatively intuitive and not deeply incorrect to say that that a learned objective of generating plausible prose can be a causal factor which has led to a tendency to generate prose which often deceives people, and I see little value in getting nitpicky about agentic assumptions in colloquial language when a vast swath of the lexicon and grammar of human languages writ large does so essentially by default. "The rain got me wet!" doesn't assume that the rain has agency

Well the definition of deception, according to Google and how I understand it, is:

> deliberately cause (someone) to believe something that is not true, especially for personal gain.

Emphasis on the personal gain part. It seems like you have a different definition.

There's no point in arguing about definitions, but I'm a big believer in that if you can identify a difference in the definitions people use early into a conversation, you can settle the argument at that.

I both agree that it's pointless to argue about definitions and think you've presented a definition that fails to capture a lot of common usage of the word. I don't think it matters what the dictionary says when we are talking about how a word is used. Like we use "deceptive" to describe inanimate objects pretty frequently. I responded to someone who thought describing the outputs of a machine learning model as deceiving people implied it had agency, which is nonsense
Isn’t that GPT Plus? Trick you into thinking you have found your new friend and they understand everything? Surely OpenAI would like people to use their GPT over a Google search.

How do you think leadership at OpenAI would respond to that?

The problems of epistemology and informational quality control are complicated, but humanity has developed a decent amount of social and procedural technology to do these, some of which has defined the organization of various institutions. The mere presence of LLMs doesn't fundamentally change how we should calibrate our beliefs or verify information. However, the mythology/marketing that LLMs are "outperforming humans" combined with the fact that the most popular ones are black boxes to the overwhelming majority of their users means that a lot of people aren't applying those tools to their outputs. As a technology, they're much more useful if you treat them with what is roughly the appropriate level of skepticism for a human stranger you're talking to on the street
I wonder what ChatGPT would have to say if I ran this text through with a specialized prompt. Your choice of words is interesting, almost like you are optimizing for persuasion, but simultaneously I get a strong vibe of intention of optimizing for truth.
I think you'll find I'm quite horseshit at optimizing for persuasion, as you can easily verify by checking any other post I've ever made and the response it generally elicits. I find myself less motivated by what people think of me every year I'm alive, and less interested in what GPT would say about my replies each of the many times someone replies just to ponder that instead of just satisfying their curiosity immediately via copy-paste. Also, in general it seems unlikely humans function as optimizers natively, because optimization tends to require drastically narrowing and quantifying your objectives. I would guess that if they're describable and consistent, most human utility functions look more like noisy prioritized sets of satisfaction criteria than the kind of objectives we can train a neural network against
This on the other hand I like, very much!

Particularly:

> Also, in general it seems unlikely humans function as optimizers natively, because optimization tends to require drastically narrowing and quantifying your objectives. I would guess that if they're describable and consistent, most human utility functions look more like noisy prioritized sets of satisfaction criteria than the kind of objectives we can train a neural network against

Considering this, what do you think us humans are actually up to, here on HN and in general? It seems clear that we are up to something, but what might it be?

On HN? Killing time, reading articles, and getting nerdsniped by the feedback loop of getting insipid replies that unfortunately so many of us are constantly stuck in

In general? Slowly dying mostly. Talking. Eating. Fucking. Staring at microbes under a microscope. Feeding cats. Planting trees. Doing cartwheels. Really depends on the human

I would tend to agree!!

> Talking.

Have you ever noticed any talking that ~"projects seriousness &/or authority about important matters" around here?

FWIW I don't understand a lot of what either of you mean, but I'm very interested. Quick run-through, excuse the editorial tone, I don't know how to give feedback on writing without it.

# Post 1

> The problems of epistemology and informational quality control are complicated, but humanity has developed a decent amount of social and procedural technology to do these, some of which has defined the organization of various institutions.

Very fluffy, creating very uncertain parsing for reader.

Should cut down, then could add specificity:

ex. "Dealing with misinformation is complicated. But we have things like dictionaries and the internet, there's even specialization in fact-checking, like Snopes.com"

(I assume the specifics I added aren't what you meant, just wanted to give an example)

> The mere presence of LLMs doesn't fundamentally change how we should calibrate our beliefs or verify information. However, the mythology/marketing that LLMs are "outperforming humans"

They do, or are clearly at par, at many tasks.

Where is the quote from?

Is bringing this up relevant to the discussion?

Would us quibbling over that be relevant to this discussion?

> combined with the fact that the most popular ones are black boxes to the overwhelming majority of their users means that a lot of people aren't applying those tools to their outputs.

Are there unpopular ones aren't black boxes?

What tools? (this may just indicate the benefit of a clearer intro)

> As a technology, they're much more useful if you treat them with what is roughly the appropriate level of skepticism for a human stranger you're talking to on the street

This is a sort of obvious conclusion compared to the complicated language leading into it, and doesn't add to the posts before it. Is there a stronger claim here?

# Post 2

> I wonder what ChatGPT would have to say if I ran this text through with a specialized prompt.

Why do you wonder that?

What does "specialized" mean in this context?

My guess is there's a prompt you have in mind, which then would clarify A) what you're wondering about B) what you meant by specialized prompt. But a prompt is a question, so it may be better to just ask the question?

> Your choice of words is interesting, almost like you are optimizing for persuasion,

What language optimizes for persuasion? I'm guessing the fluffy advanced verbiage indicates that?

Does this boil down to "Your word choice creates persuasive writing"?

> but simultaneously, I get a strong vibe of intention of optimizing for truth.

Is there a distinction here? What would "optimizing for truth" vs. "optimizing for persuasion" look like?

Do people usually write not-truthful things, to the point it's worth noting that when you think people are writing with the intention of truth?

As long as we're doing unsolicited advice, this revision seems predicated on the assumption that we are writing for a general audience, which ill suits the context in which the posts were made. This is especially bizarre because you then interject to defend the benchmarking claim I've called "marketing", and having an opinion on that subject at all makes it clear that you also at the very least understand the shared context somewhat, despite being unable to parse the fairly obvious implication that treating models with undue credulity is a direct result of the outsized and ill-defined claims about their capabilities to which I refer. I agree that I could stand to be more concise, but if you find it difficult to parse my writing, perhaps this is simply because you are not its target audience
Let's go ahead and say the LLM stuff is all marketing and it's all clearly worse than all humans. It's plainly unrelated to anything else in the post, we don't need to focus on it.

Like I said, I'm very interested!

Maybe it doesn't mean anything other than what it says on the tin? You think people should treat an LLM like a stranger making claims? Makes sense!

It's just unclear what a lot of it means and the word choice makes it seem like there's something grander going on, coughs as our compatriots in this intricately weaved thread on the international network known as the world wide web have also explicated, and imparted via the written word, as their scrivening also remarks on the lexicographical phenomenae. coughs

My only other guess is you are doing some form of performance art to teach us a broader lesson?

There's something very "off" here, and I'm not the only to note it. Like, my instinct is it's iterated writing using an LLM asked to make it more graduate-school level.

Your post and the one I originally responded to are good evidence against something I said earlier. The mere existence of LLMs does clearly change the landscape of epistemology, because whether or not they're even involved in a conversation people will constantly invoke them when they think your prose is stilted (which is, by the way, exactly the wrong instinct), or to try to posture that they occupy some sort of elevated remove from the conversation (which I'd say they demonstrate false by replying at all). I guess dehumanizing people by accusing them of being "robots" is probably as old as the usage of that word if not older, but recently interest in talking robots has dramatically increased and so here we are

I can't tell you exactly what you find "off" about my prose, because while you have advocated precision your objection is impossibly vague. I talk funny. Okay. Cool. Thanks.

Anyway, most benchmarks are garbage, and even if we take the validity of these benchmarks for granted, these AI companies don't release their datasets or even weights, so we have no idea what's out of distribution. To be clear, this means the claims can't be verified even by the standards of ML benchmarks, and thus should be taken as marketing, because companies lying about their tech has both a clearly defined motivation and a constant stream of unrelenting precedent

> There's something very "off" here

You mean on this planet?

If not, what do you think of that idea? Does something not seem....weird?

In reality, humans are often blunt and rude pessimists who say things can't be done. But "helpful chatbot" LLM's are specifically trained not to do that for anything but crude swaths of political/social/safety alignment.

When it comes to technical details, current LLM's have a bias towards sycophancy and bullshitting that humans only show when especially desperate to impress or totally fearful.

Humans make mistakes too, but the distribution of those mistakes is wildly different and generally much easier to calibrate for and work around.

Exactly, you can't even fix the problem at the root, b/c the problem is already with the humans, making up stuff.
Believe it or not, there are websites that have real things posted. This is honestly my biggest shock that OpenAI thought Reddit of all places is a trustworthy source for knowledge.
Reddit has been the most trustworthy source for me in the last ~5 years, especially when I want to buy something.
Reddit is so much better than the average SEO-optimized site that adding "reddit" to your search is a common trick for using Google.
While Reddit is often helpful for me (Google site:reddit.com), it's nice to toggle between reddit and non-reddit.

I hope LLMs will offer a "-reddit" model to switch to when needed.

The websites with content authored by people is full of bullshit, intentional and unintentional.
It’s genuinely concerning to me how many people replied with thinking reddit is the gospel for factual information.

Reddit, while it has some niche communities with tribal info and knowledge, is FULL of spam, bots, companies masquerading as users, etc etc etc. If people are truly relying on reddit as a source of truth (which OpenAI is now being influenced by), then the world is just going to be amplify all the spam that already exists

If I am going to trust a machine then it should perform at the level of a very competent human, not a general human.

Why would I want to ask your average person a physics question? Of course, their answer will probably be wrong and partly made up. Why should that be the bar?

I want it to answer at the level of a physics expert. And a physics expert is far less likely to make basic mistakes.

advael's answer was fine, but since people seem to be hung up on the wording, a more direct response:

We have human institutions dedicated at least nominally to finding and publishing truth (I hate having to qualify this, but Hacker News is so cynical and post-modernist at this point that I don't know what else to do). These include, for instance, court systems. These include a notion of evidentiary standards. Eyewitnesses are treated as more reliable than hearsay. Written or taped recordings are more reliable than both. Multiple witnesses who agree are more reliable than one. Another example is science. Science utilizes peer review, along with its own notion of hierarchy of evidence, similar to but separate from the court's. Interventional trials are better evidence than observational studies. Randomization and statistical testing is used to try and tease out effects from noise. Results that replicate are more reliable than a single study. Journalism is yet another example. This is probably the arena in which Hacker News is most cynical and will declare all of it is useless trash, but nonetheless reputable news organizations do have methods they use to try and be correct more often than they are not. They employ their own fact checkers. They seek out multiple expert sources. They send journalists directly to a scene to bear witness themselves to events as they unfold.

You're free to think this isn't sufficient, but this is how we deal with humans making up stuff and it's gotten us modern civilization at least, full of warts but also full of wonders, seemingly because we're actually right about a lot of stuff.

At some point, something analogous will presumably be the answer for how LLMs deal with this, too. The training will have to be changed to make the system aware of quality of evidence. Place greater trust in direct sensor output versus reading something online. Place greater trust in what you read from a reputable academic journal versus a Tweet. Etc. As it stands now, unlike human learners, the objective function of an LLM is just to produce a string in which each piece is in some reasonably high-density region of the probability distribution of possible next pieces as observed from historical recorded text. Luckily, producing strings in this way happens to generate a whole lot of true statements, but it does not have truth as an explicit goal and, until it does, we shouldn't forget that. Treat it with the treatment it deserves, as if some human savant with perfect recall had never left a dark room to experience the outside world, but had read everything ever written, unfortunately without any understanding of the difference between reading a textbook and reading 4chan.