| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ultmaster 317 days ago

I trained the agent itself to write SQL, run it, check results, then rewrite until correct. The write and rewrite policies are optimized with RL, using a client–server setup in Agent Lightning and a LangGraph state machine. On a 500-sample Spider eval subset, Qwen2.5-Coder-3B with 4096 context reaches 80.4% at three turns of write and rewrite, 80.2% at one turn. After training, model Qwen2.5-Coder-1.5B can be even better than Qwen2.5-Coder-3B (untrained). I have compared multiple models and settings, hoping to shed light on tuning AI agents.

Full article: https://medium.com/@yugez/training-ai-agents-to-write-and-se...

Related projects:

- Agent Lightning as the glue: https://github.com/microsoft/agent-lightning

- verl for RL algorithms: https://github.com/volcengine/verl

- vLLM for efficient rollouts: https://github.com/vllm-project/vllm

- AgentOps for collecting training data (telemetry): https://github.com/AgentOps-AI/agentops

- LangGraph for agent orchestration: https://www.langchain.com/langgraph