| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by seafoamteal 460 days ago
	It does feel like they've dialed up the model's tendency to agree with users and are dialing down the safety. My friends and I were trying to jailbreak ChatGPT by asking it to tell us how to make potentially dangerous chemicals (now, we don't know if the answers were correct, for obvious reasons) but it took only the bare minimum of creative framing before GPT happily told us the exact details. We didn't even try anything new. Surely 3 years into this, OpenAI should be focusing more on the safety of their only product?

2 comments

strictnein 460 days ago

Why should "safety" be defined as not giving people the answers they asked for? If you are trying to get it to make chemicals, why shouldn't it tell you the answer? It's not like the AI has some secret knowledge, it's just regurgitating information that could be found in the library or on Google.

link

pjc50 460 days ago

I'm not going to say whether it's good or not, but if you're operating a computer that's providing bomb-making instructions to UK residents that's quite a serious criminal offence.

(obviously the concept of "criminal offence" doesn't apply to CEOs of multibillion-dollar companies, but it's possible that the papers might get upset. Especially after the first such bomb.)

link

minimaxir 460 days ago

> the last couple of GPT-4o updates have made the personality too sycophant-y and annoying (even though there are some very good parts of it), and we are working on fixes asap, some today and some this week.

> at some point will share our learnings from this, it's been interesting.

https://x.com/sama/status/1916625892123742290

link