| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jaen 99 days ago

People building these Rube Goldberg contraptions: Do you actually run evaluations if this is any better at all than eg. giving it access to a Python REPL, or just toughing it out with random tools composed via shell scripts?

Why would an LLM be better trained to access Redis via a FS vs. a native library API?

Makes no sense.

2 comments

sgbeal 99 days ago

> Why would an LLM be better trained to access Redis via a FS vs. a native library API?

Limiting the potential blast radius.

If you give an agent "access to a Python REPL" (your words), you're giving it access to all of Python. i.e. you're paving the road to your own destruction when the agent goes awry. In the case of a Python interpreter, you're basically handing it an eight-lane highway upon which all sorts of pile-ups and havoc can happen.

By limiting its access to specific operations via well-defined endpoints (which is what the AGFS approach is), you're trimming that eight-lane highway back to a bicycle path.

link

jaen 98 days ago

That didn't answer or reply to the original question... Security is orthogonal to performance.

My question was, how is the performance better? (as implied by the word evaluations)

(also the original post was about exposing all sorts of random ops via a file system which are accessed via general shell tools most of the time, so pretty likely there's basically zero added security...)

link

calvinmorrison 99 days ago

then again with plan9 namespaces it's trivial to build sandboxes.

Why are you upset?

(I'm not upset at all, I'm confused.)

link