| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by creer 945 days ago

The current issue seems mostly of policy. That is, the current LLMs have designed-in capabilities that the owners prefer not to make available quite yet. It seems the LLM is "more inteligent / more gullible" than the policy designers. I don't know that you can aim for intelligence (/ intelligence simulacra) while not getting gullibility. It's hard to aim for "serve the needs of the user" while "second guess everything the user asks you". This general direction just begs for cat and mouse prompt engineering and indeed that was among the first things that everyone tried.

A second and imo more interesting issue is one of actually keeping an agent AI from gaining capabilities. Can you prevent the agent from learning a new trick from the user? For one, if the user installs internet access or a wallet on the user's side and bridges access to the agent.

A second agent could listen in on the conversation, classify and decide whether it goes the "wrong" way. And we are back to cat and mouse.