Hacker News new | ask | show | jobs
by I_Am_Nous 928 days ago
With current LLMs the three laws might be tough to implement in a way that can't be prompt injected around. That's why I described the extra bits as a "conscience" which could enforce the three laws. Maybe the three laws are the internal conscience's context prompt while the main LLM is more able to think anything in general and then the output is tuned down by the conscience?

Otherwise the laws will have to be implemented as weights or during training so the model explicitly knows the laws and would never even be capable of doing something against them.

1 comments

I mean, the whole purpose of the I, Robot story was to show you that the 3 laws didn't work. We had the first story on prompt injection decades ago and we just didn't realize it.
Indeed. Asimov says as much in interviews - the laws are flawed from the get go. It is the naive belief that the laws work by various scientists that creates these Sherlock-style mysteries/crimes to be solved in the first place.

I suppose the real wisdom there is that humans are doomed to fail at alignment if we create sentience and expect it only to serve us.