| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by adamsmith143 1280 days ago
	The problem with ChatGPT's "knowledge" is that it isn't trustworthy. It will happily output very confident sounding nonsense, or blatantly incorrect statements. We need a way to verify how accurate it's outputs are

7 comments

belter 1280 days ago

ChatGPT made this nice COBOL program to create an S3 Bucket,a technical impossibility...

IDENTIFICATION DIVISION. PROGRAM-ID. CREATE-S3-BUCKET.

ENVIRONMENT DIVISION. CONFIGURATION SECTION.

INPUT-OUTPUT SECTION.

DATA DIVISION. FILE SECTION.

WORKING-STORAGE SECTION. 01 AWS-ACCESS-KEY PIC X(20). 01 AWS-SECRET-KEY PIC X(40). 01 BUCKET-NAME PIC X(255).

PROCEDURE DIVISION. CREATE-BUCKET. MOVE AWS-ACCESS-KEY TO AWS-ACCESS-KEY-VAR MOVE AWS-SECRET-KEY TO AWS-SECRET-KEY-VAR MOVE BUCKET-NAME TO BUCKET-NAME-VAR INVOKE AWS-S3 "CREATE-BUCKET" USING AWS-ACCESS-KEY-VAR AWS-SECRET-KEY-VAR BUCKET-NAME-VAR

link

lmm 1280 days ago

How is that impossible? Plenty of libraries are available for COBOL, especially if you use COBOL.NET

link

belter 1280 days ago

What is the COBOL SDK for AWS?

link

lmm 1280 days ago

Probably someone sells one, or you'd just use the AWS SDK for .NET via COBOL.NET.

link

antonvs 1279 days ago

There are http client libraries for COBOL, and it’s easy to use http to make S3 api calls.

link

roperj 1280 days ago

What is the technical impediment to writing one?

link

EamonnMR 1280 days ago

"Eww, Cobol"

link

viscanti 1280 days ago

Just ask ChatGPT to implement it.

link

taylorius 1280 days ago

That was ChatGPT's first response, yes.

link

aeternum 1280 days ago

Whose knowledge is trustworthy? We've somehow come to associate certain institutions or scientific authorities with truth when that is about the furthest from real science:

"Have no respect whatsoever for authority; forget who said it and instead look what he starts with, where he ends up, and ask yourself, Is it reasonable?" -Richard P. Feynman

"One of the great commandments of science is, "Mistrust arguments from authority." -Carl Sagan

"In questions of science, the authority of a thousand is not worth the humble reasoning of a single individual." -Galileo Galilei

link

burnished 1280 days ago

I think part of the issue is that it’s easier to test the limits of or a humans knowledge, and ironically with your quotes I think you’ve supplied evidence that trust is crucial, in that the truest expression of those quotes would be to just deliver the payload and not attach any sort of authority by association to it.

You can’t trust it’s answers (to be fair that’s the existing status quo), but you also can’t easily test it because it will return reasonable sounding garbage. Conversely you can discover ignorance in most humans pretty quickly by exhausting their ability to respond (or your ability to ask).

link

visarga 1280 days ago

A generative system, be it a neural network or a human, needs a way to test ideas in order to align with reality. If testing is available, then it is possible to advance the state of the art. Ideas are cheap, results matter.

link

burnished 1279 days ago

Sure, but that doesn’t seem to square with the topic at hand - “why does an infinite truth and lies machine feel less trustworthy than another human”. It just isn’t a question that needs a high degree of abstraction to respond to.

link

visarga 1279 days ago

It is sometimes a lie machine because it lacks grounding in verification. Humans get more grounding than language models but even we are not 100% there - remember the antivax hysteria. The most grounded field is science, but even in scientific papers most things don't replicate. Verification is hard on all levels and requires extensive work. In particle physics all scientists clump together around the CERN accelerator as it is the only source of verification they have (almost, I exaggerate a bit).

It's going to be important to develop AI methods to test and verify, I think unverified model outputs are worthless verbiage. Verification can be based on references, code execution, physical simulations, lab experiments and even language based simulations.

In a few years the situation is going to flip, AI is going to become more reliable than humans. Being tested on millions of cases, it will be more trustworthy than us, no human can be tested to that extent. It's going to be interesting to see how we react to super-valid AI. Our guiding role is going to shrink more and more, we will be the children.

link

baandang 1279 days ago

Those quotes are not about trust, they are about the rhetorical technique of appeal to authority.

The actual payload though is the mistrust of authority exactly because we are all so susceptible to the logical fallacy of appeal to authority masquerading bullshit as truth.

There is no problem to solve here. ChatGPT should never be an authority on anything.

link

ajuc 1279 days ago

I, too, always recreate double blind experiments before I take the drugs my doctor gives me :)

I also double-check the transistors in my computer work correctly before I run any code on them, and of course I re-derive the physics to be able to do that :)

In practice you are an expert in a very small domain (if any) and in all the other domains you have no choice but to accept somebody's authority.

link

aeternum 1279 days ago

That's good, I don't go quite as far, but do try to consult multiple independent sources.

Doctors have been known to overprescribe things like Benzos, and opioids from time to time.

I also just use tools like a RAM diagnostic that can check large numbers of transistors at once. I imagine you're quite good at QM after all that practice applying the wave equation though. Impressive!

link

hippich 1280 days ago

There is some kind of recursion in here with authors names and "have no respect..." part:)

link

aeternum 1280 days ago

Fair point

link

AnthonyMouse 1280 days ago

Actually, it isn't.

Their arguments are of the form "this statement could be false."

If you evaluate it as a true statement, you have no problems. It could be false; you have to evaluate it for yourself instead of trusting some authority.

It's only if you assert that it's certainly false that you have a problem. Because then it's clearly true -- since otherwise these authorities would be telling you something false, which proves their assertion true.

Put another way, it can get you from the undesirable position of blindly trusting authorities to the desirable position of questioning them, but not the other way around. Which is the intended result.

link

alasdair_ 1279 days ago

It’s slightly ironic that the only reason we pay attention to these particular quotes is that they come from famous physicists, i.e. authorities.

link

jtxt 1280 days ago

One way I tried to do this is by having it write an answer, and a footnote reference at each fact. [1] then list search terms that be used to verify each claim, then I would respond with the url and quotes from found pages for each one, then have it rewrite the answer based on that information and cite the sources. I think something this direction can be automated. I saw someone do this with math and other tasks, that would talk to a connected program before answering.

link

visarga 1280 days ago

Yes, it's been done both in papers and in various GPT-3 projects. As long as you can find relevant references the LM will become reliable.

link

adamsmith143 1279 days ago

I did this as well and it looks great initially but there are already examples of GPT generating totally bogus references and sources. So we're back to square 1.

link

politician 1280 days ago

Sounds like an interesting way to reboot Wikipedia.

link

rdedev 1280 days ago

I just had a run in with this yesterday. I asked it to explain box embeddings. It's a pretty niche topic so I didn't expect it to give the right answer. But the answer it gave sounded so confident but it was so wrong. It took a not al vector embeddings approach but replaced that with box. I tried correcting it but it refused to budge and still sounded confident.

link

matthewdgreen 1280 days ago

I asked it to explain part of my thesis work on Oblivious Transfer, and it gave me a lovely prose description of the Green-Hohenberger Oblivious Transfer protocol. It was clear and confident, and the thing it described was even an actual protocol. It just wasn’t in any way our protocol: GPT just took some classical protocol it found elsewhere and relabeled it.

link

kyle_grove 1280 days ago

Sounds like many humans I know.

link

moffkalast 1280 days ago

ChatGPT to be employed in marketing positions immediately.

link

eternalban 1280 days ago

Think bigger. PresidentGPT. On tweeter!

link

visarga 1280 days ago

You are right, this is the pain point - trust, verification. I think it will become the next focus of research.

There are many things we could do to solve this problem. One of them is to use an external reference for verification. Another one is to train the model to verify facts by augmenting the input with lies - adversarial training for lie detection. Problem solving can be improved by generating more data with the current version of LM for the next one, if we can verify the outputs to be correct.

link

lossolo 1280 days ago

Sure but you can only verify facts like "when was <someone> born?", you can verify this today easily with knowledge database but that's not what is interesting in ChatGPT, what's interesting is what it can generate which you can't easily fact check, like "generate me a poem in style of <someone> and <someone>", how can you verify that the style is correct automatically? or "write me code that connects to not public system and does <here long instruction in words>", how can you verify if this code works properly without access to that system and ability to run it yourself?

link

palata 1280 days ago

> There are many things we could do to solve this problem.

Just like what social networks have failed to do in years? Not sure it's that simple :-)

link

nathias 1280 days ago

so, much like other knowledge sources?

link

roywiggins 1280 days ago

Most knowledge sources don't make up totally fictional citations to nonexistent sources. Or, if they do, nobody uses them for anything serious. Even Wikipedia citations will get removed if they point to URLs that never existed.

link

nathias 1279 days ago

if we focus on the best sources, even in studies a lot of research can't be replicated, and if we focus on te most common ones like newspapers and tv, I'd say most of it is made up or might as well be

link

roywiggins 1278 days ago

Sure, nothing is perfect.

But I'm not talking about it just being wrong, I'm talking about it citing webpages and books that don't exist and never did[0]. If Wikipedia regularly had that sort of quality issue people just wouldn't use it. There's a threshold below which something stops being useful.

[0] Bloggs, Joe. "ChatGPT just makes stuff up". Nature, vol 123, 2022, pp 123-321. Wiley Online Library, https://doi.org/10.1111/111/111

link

adamsmith143 1279 days ago

That's just a bad take and it doesn't excuse the problems with GPT.

link

adamsmith143 1279 days ago

Ok but if I read a paper from a well known Author published in NeurIps or Nature I have a good sense of how trustworthy that paper might be. Even if we ask GPT to cite it's sources, which it will do, it will then also happily generate false sources. It's untrustworthy turtles all the way down.

link