|
|
|
|
|
by jasfi
985 days ago
|
|
By properly training LLMs, and filters to catch unwanted behavior, this can be mitigated. Even without all that, the agent would need mechanisms to protect itself that would also cause harm. The scenario you suggest is so unlikely with all the protections that would be in place, that you would actually need someone with the goal of making LLMs behave maliciously for it to succeed at all. At the end of the day, it comes back to people and their goals. |
|
I feel like unless we gain the ability to debug each node the way we do with actual software we won't be able to solve the alignment problem. I saw on HN that antropic is working on it but I'm not knowledgeable enough on the subject to comment if it's actually feasible.
Probably the best case scenario for humanity is that LLMs plateau somehow and don't get much better for quite some time.