Hacker News new | ask | show | jobs
Show HN: AgentScript AI – Build Agents that think in code (github.com)
4 points by kedrzu 533 days ago
AgentScript is a new AI Agent framework with a novel architecture to build reliable and secure agents.

Our framework has LLMs generate a plan upfront in code (javascript) which we then parse into an AST and execute in a managed runtime, so one gets built-in state management and better visibility for debugging via the AST.

By making an LLM express execution plan as code, Agents can think more abstractly about the task and do not even need to know all the data to perform operations on it or make decisions.

Data is expressed as local variables and can be passed to tools, which can be normal deterministic functions, or LLM enabled ones, built using LangChain or any other library.

State management and Human in the loop: This is where our approach really shines esp for agents tackling critical/sensitive workflows in regulated space such as fintech, payments & crypto for ex a treasury agent.

Because AgentScript works on AST, not really running the generated code, execution can be paused on each statement or a tool call. It can be serialized and put into a database, then retrieved and resumed from where it stopped. Each tool would be able to store its state, wait for a user interaction, an event, or some time to pass. They will have built in interactivity and approval mechanisms, so it will be easy to add human in the loop.

Here is a video showing how to build AI agents using AgentScript https://youtu.be/b3MlCpBoxNM

Here is a link to our Discord https://discord.gg/hEYMnj62DT

Please give it a spin and help us improve! This is our alpha release. We'd love to hear your feedback on what to build next.

This is our roadmap:

- Execution serialization and deserialization

- More JS features:

  - `if` statements,

  - `for` loops,

  - template literals

  - arrow functions

  - unary and binary operators
- Input variables

- Tool state

- Tool interactivity

- Observability and debugging

- Python implementation (any python gurus? ping us!)

---

Let Them Code!

1 comments

Oh no! Please don't do this. This is one of those tarpit ideas in the "AI" space.

> do not even need to know all the data to perform operations on it or make decisions

If I know code-generation is going to be possible without any contextual information, I might as well generate the code using copilot or Curor and commit my code. Why do I need a runtime agent to do it?

What if the control flow has to change based on a result it receives?

What if the plan up front is wrong and needs to change halfway? Do I run the entire thing again with a new plan? What if my tools are not idempotent?

What if it generates a recursive loop?

Also, if I really want to do this, and if my tools are safe, why don't I just do a raw Open AI / Claude call and get deno subhosting [1] or E2B [2] to run it?

[1] https://deno.com/subhosting

[2] https://e2b.dev/

Thanks for feedback, let me break it down:

Code is just a representation of the agent execution flow, different than static workflows like in Glide, or typical ping-pong type of re-act agent in any other framework.

It's not meant to replace your application logic, but serve as a dynamic plan the LLM is generating to complete your ad-hoc task, so there is a little sense of committing it to repo.

In your app code, control flow also changes based on data using `if` statements. And you don't need to know the data to write your code, right? LLM can handle those as well.

We are working on making the execution more dynamic, possibly allowing code to be generated and executed progressively when data arrives and new prompts are given.

Tools will indeed need to be written as idempotent, especially for the interactivity and human in the loop mechanism we are working on.

Detecting recursive loops and other issues is a technical and solvable problem, we'll get to those as well.

Running code in deno or e2b, you cannot pause it to wait for human interaction for a day, then resume it, so it won't really help with interactive agents doing really long running operations and interacting with real world.

> In your app code, control flow also changes based on data using `if` statements. And you don't need to know the data to write your code, right?

Yes. Which is why it's in my app code, committed to source code, and with a guarantee of determinism. If you don't consult the runtime information (i.e. data), there's no reason why it has to be generated at runtime.

> We are working on making the execution more dynamic, possibly allowing code to be generated and executed progressively when data arrives and new prompts are given.

I think you just described how a react agent works.

> you cannot pause it to wait for human interaction for a day, then resume it

Sure you can! Have you seen langgraph, trigger, temporal?

Also, how do you pause based on an AST that might get invalidated if you recompute it on new contextual information?

Does the human keep approving until your AST works?

Sorry to be terse, but surely I'm missing something here.

Since you're computing the AST upfront, you'd get it wrong more times than a react agent that does it as and when needed. So, do I pay OpenAI everytime you make a mistake on the AST?

Or is your claim that your AST is infallible?