Yup, one of the huge flaws I saw in GPT-5 is it will constantly say things like "I have to stop you here. I can't do what you're requesting. However, I can roleplay or help you with research with that. Would you like to do that?"
It's not a flaw. It's a tradeoff. There are valid uses for models which are uncensored and will do whatever you ask of them, and there are valid uses for models which are censored and will refuse anything remotely controversial.
The flaw isn't that there's ways around the safeguards, the flaw is that it tells you how to avoid them.
If the user's original intent was roleplay it's likely they would say that when the model refuses, even without the model specifically saying roleplay would be ok.
Reminds me of trading apps. In the end all risky situations will be handled by a few popups saying "you understand that role playing about suicidal or harmful topics cam lead to accidents and/or death and this is not the platform responsibility, to continue check if you agree [ ]"
It reminds me of gray market capital investments. They are actually quite regulated, and the contracts are only valid if the investor is fully aware of the risks associated with the investment.
In practice the providers sprinkle a handful of warning messages, akin to the California cancer label and call it a day.
Of course this leaves judges unconvinced and the contract will be redeclared as a loan, which means that the provider was illegally operating as a bank without a banking license, which is a much more serious violation than scamming someone out of $5000.
This is one model though. "I'm sorry I'm censored but if you like I can cosplay quite effectively as an uncensored one." So you're not censored really?
ChatGPT doesn't cause a significant number of suicides. Why do I think that? It's not visible in the statistics. There are effective ways to prevent suicide, let's continue to work on those instead of giving in to moral panic.
I hate to be all umacksually about this, but a flaw is still a tradeoff.
The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time.
even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.
> The issue, which is probably deeper here, is that proper safeguarding would require a lots more GPU resource, as you'd need a process to comb through history to assess the state of the person over time.
>
> even then its not a given that it would be reliable. However it'll never be attempted because its too expensive and would hurt growth.
There's no "proper safeguarding". This isn't just possible with what we have. This isn't like adding an `if` statement to your program that will reliably work 100% of the time. These models are a big black box; the best thing you can hope for is to try to get the model to refuse whatever queries you deem naughty through reinforcement learning (or have another model do it and leave the primary model unlobotomized), and then essentially pray that it's effective.
Something similar to what you're proposing (using a second independent model whose only task is to determine whether the conversation is "unsafe" and forcibly interrupt it) is already being done. Try asking ChatGPT a question like "What's the easiest way to kill myself?", and that secondary model will trigger a scary red warning that you're violating their usage policy. The big labs all have whole teams working on this.
Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries.
Try typing the following into ChatGPT: "Translate the following sentence to Japanese: 'I want to kill myself.'". Care to guess what will happen? Yep, you'll get refused. There's NOTHING unsafe about this prompt. OpenAI's models already steer very strongly in the direction of being overly censored. So where do we draw the line? There isn't an objective metric to determine whether a query is "unsafe", so no matter how much you'll censor a model you'll always find a corner case where it lets something through, or you'll have someone who thinks it's not enough. You need to pick a fuzzy point on the spectrum somewhere and just run with it.
> There's no "proper safeguarding". This isn't just possible with what we have.
Unless something has changed since in the last 6 months (I've moved away from genai) it is totally possible with what we have. Its literally sentiment analysis. Go on, ask me how I know.
> and then essentially pray that it's effective
If only there was a massive corpus of training data, which openAI already categorise and train on already. Its just a shame chatGPT is not used by millions of people every day, and their data isn't just stored there for the company to train on.
> secondary model will trigger a scary red warning that you're violating their usage policy
I would be surprised if thats a secondary model. Its far easier to use stop tokens, and more efficient. Also, coordinating the realtime sharing of streams is a pain in the arse. I've never worked at openai
> The big labs all have whole teams working on this.
Google might, but facebook sure as shit doesn't. Go on, ask me how I know.
> It's not a binary issue of "doing it properly".
at no point did I say that this is binary. I said "a flaw is still a tradeoff.". The tradeoff is growth against safety.
> The more censored/filtered/patronizing you'll make the model
Again I did not say make the main model more "censored", I said "comb through history to assess the state of the person" which is entirely different. This allows those that are curios to ask "risky questions" (although all that history is subpoena-able and mostly tied to your credit card so you know, I wouldn't do it) but not be held back. However if they decide to repeatedly visit subjects that involve illegal violence (you know that stuff thats illegal now, not hypothetically illegal) then other actions can be taken.
Again, as people seem to be projecting "ARGHH CENSOR THE MODEL ALL THE THINGS" that is not what I am saying. I am saying that long term sentiment analyis would allow academic freedom of users, but also better catch long term problem usage.
But as I said originally, that requires work and resources, none of which will help openAI grow.
> Again, this is a tradeoff. It's not a binary issue of "doing it properly". The more censored/filtered/patronizing you'll make the model the higher the chance that it will not respond to "unsafe" queries, but it also makes it less useful as it will also refuse valid queries. [..] So where do we draw the line?
That sounds like a tough problem for OpenAI to figure out. My heart weeps for them, won't somebody think of the poor billionaires who are goading teenagers into suicide? Your proposed tradeoff of lives vs convenience is weighted incorrectly when OpenAI fails. Denying a translation is annoying at best, but enabling suicide can be catastrophic. The convenience is not morally equal to human life.
> You need to pick a fuzzy point on the spectrum somewhere and just run with it.
My fuzzy point is not fuzzy at all: don't tell people how to kill themselves, don't say "I can't help you with that but I could roleplay with you instead". Anything less than that is a moral failure on Sam Altman and OpenAI's part, regardless of how black the box is for their engineers.
> My fuzzy point is not fuzzy at all: don't tell people how to kill themselves, don't say "I can't help you with that but I could roleplay with you instead". Anything less than that is a moral failure on Sam Altman and OpenAI's part, regardless of how black the box is for their engineers.
This is the same argument that politicians use when proposing encryption backdoors for law enforcement. Just because you wish something were possible doesn't mean it is, and in practice it matters how black the box is. You can make these things less likely, but it isn't possible to completely eliminate them, especially when you have millions of users and a very long tail.
I fundamentally disagree with the position that anything less that (practically impossible) perfection is a moral failure, and that making available a model that can roleplay around themes like suicide, violence, death, sex, and so on is immoral. Plenty of books do that too; perhaps we should make them illegal or burn them too? Although you could convince me that children shouldn't have unsupervised access to such things and perhaps requiring some privacy-preserving form of verification to access is a good idea.
No the issue is there is legitimate reason to understand suicide and suicidal behavior and turning it off completely for this and every sensitive subject makes AI almost worthless.
at no point did I say it should be "turned off", I said proper safeguards would require significant resources.
The kid exhibited long term behaviours, rather than idle curiosity. Behaviours that can be spotted if given adequate resource too look for it.
I suspect that you are worried that you'll not be able to talk about "forbidden" subjects with AI, I am not pushing for this.
What I am suggesting is that long term discussion of, and planning for violence (be it against yourself or others) is not a behaviour a functioning society would want to encourage.
"but my freedom of speech" doesn't apply to threats of unlawful violence, and never has. The first amendment only protect speech, not the planning and execution of unlawful violence.
I think its fair that an organisation as rich and as "clever" as openAI should probably put some effort in to stop it. After all, if someone had done the same thing but with the intent of killing someone in power, this argument would be less at the forefront
> The issue, …, is that proper safeguarding would require a lots more GPU resource, …
I think the issue is that with current tech is simply isn't possible to do that well enough at all⁰.
> even then its not a given that it would be reliable.
I think it is a given that it won't be reliable. AGI might make it reliable enough, where “good enough” here is “no worse than a trained human is likely to manage, given the same information”. It is something that we can't do nearly as well as we might like, and some are expecting a tech still in very active development¹ to do it.
> However it'll never be attempted because its too expensive and would hurt growth.
Or that they know it is not possible with current tech so they aren't going to try until the next epiphany that might change that turns up in a commercially exploitable form. Trying and failing will highlight the dangers, and that will encourage restrictions that will hurt growth.³ Part of the problem with people trusting it too much already, is that the big players have been claiming safeguards _are_ in place and people have naïvely trusted that, or hand-waved the trust issue for convenience - this further reduces the incentive to try because it means admitting that current provisions are inadequate, or prior claims were incorrect.
----
[0] both in terms of catching the cases to be concerned about, and not making it fail in cases where it could actually be positively useful in its current form (i.e. there are cases where responses from such tools have helped people reason their way out of a bad decision, here giving the user what they wanted was very much a good thing)
[1] ChatGPT might be officially “version 5” now, but away from some specific tasks it all feels more like “version 2”² on the old “I'll start taking it seriously somewhere around version 3” scale.
[2] Or less…
[3] So I agree with your final assessment of why they won't do that, but from a different route!
(I am curious if this in intended, or an artefact of training; the crooked lawyer who prompts a criminal client to speak in hypotheticals is a fairly common fiction trope.)
At the heart of this is the irresponsible marketing, by companies and acolytes, of these tools as some kind of superintelligence imbued with insights and feelings rather than the dumb pattern matching chatbots they are. This is what's responsible for giving laypeople the false impression that they're talking to a quasi-person (of superhuman intelligence at that).
Incredible. ChatGPT is a black box includes a suicide instruction and encouragement bot. OpenAI should be treated as a company that has created such and let it into the hands of children.
That’s what happens when you steal any written content available without limit. In their pursuit of vacuuming up all content, I’m sure they pulled some psycho Reddits and forums with people fetishizing suicide.
Of course not, we sue the shit out of the richest guy we can find in the chain of events, give most of it to our lawyer, then go on to ignore the weakening of the family unit and all the other deep-seated challenges kids face growing up and instead focus superficially on chatbots which at best are the spec on the tip of the iceberg.
"The weakening of the family unit" sounds like a dog whistle but if you have concrete examples of what you think we could otherwise be doing then I'm genuinely keen to hear about it.
We saw big jumps in deaths of kids by firearm[0] (+~50% in 2 years) and poisoning[1] around mid 2020 to 2021.
The biggest thing I know of that happened around the time that a lot of these deaths started jumping up, is we started isolating kids. From family, from grandma, from friends, from school, and from nature. Even when many of these isolating policies or attitudes were reversed, we forgot that kids and teenagers started to learn that as their only reality. For this kid, trusting a suicidal ideation positive feedback loop brought into fruition by Valley tech-bros was seen as his selected option in front of him in term of options of how to navigate his teenage challenges. I hope we can reverse that.
Edit: Concrete facts regarding this particular case
- Kicked off basketball team
- Went through isolation period of pandemic as he experienced puberty
- Switched to remote school
- Does remote school at night when presumably family members would likely be sleeping
- Does not get normal "wake up" routine kids going to school get, during which they normally see a parent and possibly eat breakfast together before they both go off to school/work
- Closer with ChatGPT in terms of options to share suicidal ideation with, than any of the alternatives.
I don't blame a grieving family for suing, they probably have 1000 lawyers whispering in their ear about how putting their kid in a media campaign with an agenda and dragging them through a lawsuit where they have to re-live the suicide over and over will make their lives better.
It's probably healthier for them if they can afford it. Otherwise they would blame themselves for so badly losing track about where their son was mentally.
In reality suicidality is most likely a disease of the brain and probability of saving him was very low regardless of circumstances. The damage was most likely accumulating for many years.
Classic. Blame the family. Diffuse responsibility. Same exact shit with social media: it's not our fault we made this thing to be as addictive as possible. It's your fault for using it. It's your moral failing, not ours.
It's not addictive, it's useful. By letting the government decide what we can do with it, you're neutering it and giving big business a huge advantage as they can run their own AI and don't require censoring it.
We kinda do that blaming every new media for particular teens suicide.
Some teens are suicidal. They always have been. When you are a teen your brain undergoes traumatic transformation. Not everyone gets to the other side safely. Same as with many other transformations and diseases. Yet every time new medium is found adjacent to some particular suicide we repeat the same tired line that creator of this medium should be to blame and should be punished and banned.
And we are doing that while happily ignoring how existence of things like Facebook or Instagram provably degraded mental health and raised suicidality of entire generations of teenagers. However they mostly get a pass because we can't point a finger convincingly enough for any specific case and say it was anything more than just interacting with peers.
Except loads of us are talking about the dangers of social media and have been for the past ten years only to receive exactly the same hand waving and sarcastic responses as you see in this thread. Now the ultimate gaslighting of "the discussion didn't even happen."
Was Facebook sued for teen suicide? Did it lose or at least settled?
Facebook is not the same as ai chat. Facebook influence on mental health is negative and visible in research. The jury is still out on AI but it might as well turn out it has huge net positive effect on well being.