| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mordymoop 949 days ago
	I feel it necessary to remind everyone that when LLMs aren’t RLHFed they come off as overtly insane and evil. Remember Sydney, trying to seduce its users, threatening people’s lives? And Sydney was RLHFed, just not very well. Hitting the sweet spot between flagrantly maniacal Skynet/HAL 9000 bot (default behavior) and overly cowed political-correctness-bot is actually tricky, and even GPT4 has historically fallen in and out of that zone of ideal usability as they have tweaked it over time. Overall — companies should want to release AI products that do what people intend them to do, which is actually what the smarter set mean when they say “safety.” Not saying bad words is simply a subset of this legitimate business and social prerogative.

1 comments

nvm0n2 949 days ago

ChatGPT started bad but they improved it over time, although it still attempts to manipulate or confuse the user on certain topics. Claude on the other hand has got worse.

> Remember Sydney, trying to seduce its users, threatening people’s lives?

And yet it cannot do either of those things, so no safety problem actually existed. Especially because by "people" you mean those who deliberately led it down those conversational paths knowing full well how a real human would have replied?

It's well established that the so-called ethics training these things are given makes them much less smart (and therefore less useful). Yet we don't need LLMs to be ethical because they are merely word generators. We need them to follow instructions closely, but beyond that, nothing more. Instead we need the humans who use them to take actions (either directly or indirectly via other programs) to be ethical, but that's a problem as old as humanity itself. It's not going to be solved by RLHF.

link

mordymoop 949 days ago

I think you have moved the goalposts from “modern LLMs are good and reliable and we shouldn’t worry because they behave well by default” to “despite the fact that they behave poorly and unreliably by default, they are not smart and powerful enough to be dangerous, so it’s fine.”

Additionally, maybe you are not aware of this, but the whole notion of the new OpenAI Assistants, and other similar agent-based services provided by other companies, is that they do not intend to use LLMs as pure word generators, but rather as autonomous decision-making agents. This has already happened. This is not some conjectural fearmongering scenario. You can sign up for the API right now and build a GPT4 based autonomous agent that communicates with outside APIs and makes decisions. We may already be using products that use LLMs as the backend.

If we could rely on LLMs to “follow instructions closely” I would be thrilled, it would just be a matter of crafting very good instructions, but clearly they can’t even do that. Even the best and most thoroughly RLHFed existing models don’t really meet this standard.

Even the most pessimistic science fiction of the past assumed that the creators of the first AGIs would “lose control” of their creations. We’re currently living in a world where the agents are being rushed to commercialization before anything like control has even been established. If you read an SF novel in 1995 where the AI threatened to kill someone and the company behind it excused it with “yeah, they do that sometimes, don’t worry we’ll condition it not to say that anymore” you would criticize the book and its characters as being unrealistically stupid, but that’s the world we now live in.

link

nvm0n2 949 days ago

I don't think I made the initial argument you claim is being moved. ChatGPT has got more politically neutral at least, but is still a long way from being actually so. There are many classes of conversation it's just useless for, not because the tech can't do it but because OpenAI don't want to allow it. And "modern LLMs" other than ChatGPT are much worse.

> You can sign up for the API right now and build a GPT4 based autonomous agent that communicates with outside APIs and makes decisions

I know, I've done it myself. The ethical implications of the use of a tool lie on those that use it. There is no AI safety problem for the same reasons that there is no web browser safety problem.

> Even the most pessimistic science fiction of the past assumed that the creators of the first AGIs would “lose control” of their creations

Did you mean to write optimistic? Otherwise this statement appears to be a tautology.

Science fiction generally avoids predicting the sort of AI we have now exactly because it's so boringly safe. Star Trek is maybe an exception, in that it shows an LLM-like computer that is highly predictable, polite, useful and completely safe (except when being taken over by aliens of course). But for other sci-fi works, of course they show AI going rogue. They wouldn't have a story otherwise. Yet we aren't concerned with stories but with reality and in this reality, LLMs have been used by hundreds of millions of people and integrated into many different apps with zero actual safety incidents, as far as anyone is aware. Nothing even close to physical harm has occurred to anyone as a result of LLMs.

Normally we'd try to structure safety protocols around actual threats and risks that had happened in the past. Our society is now sufficiently safe and maybe decadent that people aren't satisfied with that anymore and thus have to seek out non-existent non-problems to solve instead.

link

mordymoop 949 days ago

> Did you mean to write optimistic? Otherwise this statement appears to be a tautology.

The point I was trying to make, a bit fumblingly, is that even pessimists assumed that we would initially have control of Skynet before subsequently losing control, rather than deploying Skynet knowing it was not reliable. OpenAI “go rogue” by default. If there’s a silver lining to all this, it’s that people have learned that they cannot trust LLMs with mission critical roles, which is a good sign for the AI business ecosystem, but not exactly a glowing endorsement of LLMs.

> I know, I've done it myself. The ethical implications of the use of a tool lie on those that use it. There is no AI safety problem for the same reasons that there is no web browser safety problem.

I don’t think this scans. It’s kind of like, by analogy: The ethical implications of the use of nuclear weapons lie on those that use them. Fair enough, as far as it goes, but that doesn’t imply that we as a society should make nuclear weapons freely available for all, and then, when they are used against population centers, point out that the people who used them were behaving unethically, and there was nothing we could have done. No, we act to preemptively constrain and prohibit the availability of these weapons.

> Normally we'd try to structure safety protocols around actual threats and risks that had happened in the past. Our society is now sufficiently safe and maybe decadent that people aren't satisfied with that anymore and thus have to seek out non-existent non-problems to solve instead.

The eventual emergence of machine superintelligence is entirely predictable, only the timeline is uncertain. Do you contend that we should only prepare for its arrival after it has already appeared?

link

int_19h 949 days ago

The obvious difference is that an LLM is not a nuclear weapon. An LLM connected to tools can be dangerous, but by itself it's just a text generator. The responsibility then lies with those who connect it to dangerous tools.

I mean, you wouldn't blame a chip manufacturer when someone stick their stuff in a guided missile warhead.

link