| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chaxor 1174 days ago

You're missing the point entirely.

Systems can have these unintended consequences very easily - and not necessarily from malicious actors.

Non malicious users can easily cause catastrophic problems from simply setting up a system and setting it to a goal, e.g. 'make me a sandwhich'. If the system really, really is trained with the intent to do anything possible to fulfill this goal, it can identity a plan (long term planning is already seen in gpt-4) and set out the steps for this plan. Reflexion has shown how to feed things back to itself over and over until it's achieved difficult goals. Aquarium can be used to spin up thousands of containers that make other agents to raise money online and purchase a small robot. That robot may be used to 'make the sandwhich'.

It's obviously a poor example here, but the bigger point is - there a tons of different ways this can occur and we are essentially guaranteed not to know the many ways this can happen. A non-malicious user can end up causing unintended consequences.