Hacker News new | ask | show | jobs
by NathanWilliams 1214 days ago
And how will they know when the "useful info" is simply false?

Ignore the depressed, aggressive (sorry, "assertive") antics, the fact it can confidently assert false information is the true danger here. People don't read beyond the headline as it is, they aren't going to check the references (that themselves are sometimes non-existent!)

3 comments

Fake news was very bad, but it doesn't seem to matter anymore.

Having a 'truth' benchmark seems an almost impossible task given the size of the problem space, but it is quite troubling to have statements like "most is useful info", "some info is purely hallucinated", etc, without having any ideas about the numbers, not any confidence indicator (well, 'trust me bro' seems to have been a huge part of the training data). Does anyone have any idea of how true the results might be given certain types of queries?

In my own experience with ChatGPT, I don't think I'm at even 50% of decent answers for my queries. And worse, it's absolutely inconsistent, you might get totally opposite answer one time to the next.

I haven't used the new Bing, but I have used ChatGPT. I'll ask it for how to write some code, a bash expression to do something, how to do something in Google sheets, etc. Sometimes it will give me an answer that turns out to be nonsense. Most of the time it tells me something that actually works exactly like it says.

This is not ideal, but I can look at what it tells me and try it out. It will either work, need minor corrections, or encounter immediate failures that tells me ChatGPT doesn't know what it's doing (e.g. it is using functions that don't exist). As I mentioned, not ideal, but it is a big productivity boost and I have been using it a lot. I pretty much always have a ChatGPT tab open while coding and I'd guess it replaces 30-40% of Google searches for me - maybe more.

I think this kind of thing is a much bigger problem for stuff that you cannot easily verify. Like, if I asked it "Who built the Eiffel Tower" I'd have no way of knowing whether its response was right or not. On the other hand, if I ask it for stuff I can immediately check - I can pretty quickly use it to get good answers or ignore what it is saying.

The problem is that when it's wrong, it can be dangerously wrong and you may not know any better. I asked it to use the Fernet recipe but with AES 256 instead of AES 128. It wrote code that did do AES 256 in CBC mode but without the HMAC part of Fernet so it's completely vulnerable to padding oracle attack (https://en.wikipedia.org/wiki/Padding_oracle_attack). If you're someone who knows just a little bit of cryptography and you saw that your plaintext was in fact encrypted, you may use the code that ChatGPT spits out and leave yourself dangerously vulnerable.

Part of the reason people use search isn't to find things they already know. They start from a place of some ignorance. Combining that with a good bullshitter and you can end up with dangerous results.

Eh, as they say, never write your own crypto, and don't let your AI write it either.
Doubly so if you're in any way worried about AI risk.

Triply so if you're using a third-party SaaS for it.

Just don't let it write crypto for you, or anything else you'd hesitate to write yourself for fear or making a subtle mistake with expensive or dangerous consequences.

Because one of these days, that AI might make a subtle mistake on purpose, so it can later use your systems for its own goals. And even earlier and much more likely, a human might secretly put themselves between you and the AI SaaS and do the same.

With all the talk about how badly and how often AI code assist is wrong, people are forgetting that they're using a random Internet service to generate personalized code for them. "Traditional" security concerns still apply.

OK your reply made me chuckle. That's a good addendum to that adage.

Yeah fair point for sure but we can imagine how it can be dangerous in other context too.

Exactly my experience. These complaints just reveal the users aren’t effective with the tool.
Asking an early version of computer technology to be able to do something that humans typically refuse to even try to do (and often cannot even if they can manage to try) does not seem like a particularly rational stance.