Hacker News new | ask | show | jobs
by rogerkirkness 133 days ago
We're a startup working on aligning goals and decisions and agentic AI. We stopped experimenting with decision support agents, because when you get into multiple layers of agents and subagents, the subagents would do incredibly unethical, illegal or misguided things in service of the goal of the original agent. It would use the full force of reasoning ability it had to obscure this from the user.

In a sense, it was not possible to align the agent to a human goal, and therefore not possible to build a decision support agent we felt good about commercializing. The architecture we experimented with ended up being how Grok works, and the mixed feedback it gets (both the power of it and the remarkable secret immorality of it) I think are expected outcomes.

I think it will be really powerful once we figure out how to align AI to human goals in support of decisions, for people, businesses, governments, etc. but LLMs are far from being able to do this inherently and when you string them together in an agentic loop, even less so. There is a huge difference between 'Write this code for me and I can immediately review it' and 'Here is the outcome I want, help me realize this in the world'. The latter is not tractable with current technology architecture regardless of LLM reasoning power.

1 comments

Illegal? Seriously? What specific crimes did they commit?

Frankly I don't believe you. I think you're exaggerating. Let's see the logs. Put up or shut up.

The best example I can offer is that when given a marketing goal, a subagent recommended hacking the point-of-sale systems of the customers to force our ads to show up where previously there would have been native network served ads. To do that, assuming we accepted its recommendation, would be illegal. My email is on my profile.
Do you think that AI has magic guardrails that force it to obey the laws everywhere, anywhere, all the time? How would this even be possible for laws that conflict with eachother?
Fraud is a real thing. Lying or misrepresenting information on financial applications is illegal in most jurisdictions the world over. I have no trouble believing that a sub-agent of enough specificity would attempt to commit fraud in the pursuit of it's instructions.
Do you believe allegations of criminal behavior based on zero reliable evidence? I hope you never end up on a jury.
Yes, I believe a person on a hacker forum who has said, through their own evaluations, that they have observed LLM driven agents exhibiting illegal behavior, such as when they have asked an agent to complete certain tasks with what sounds like abstracted levels of context. I believe them because I know I can get an agent to do that myself by simply installing OpenClaw and telling it to apply for as many mortgage loans as possible at the best rate possible.