I trained the agent itself to write SQL, run it, check results, then rewrite until correct. The write and rewrite policies are optimized with RL, using a client–server setup in Agent Lightning and a LangGraph state machine. On a 500-sample Spider eval subset, Qwen2.5-Coder-3B with 4096 context reaches 80.4% at three turns of write and rewrite, 80.2% at one turn. After training, model Qwen2.5-Coder-1.5B can be even better than Qwen2.5-Coder-3B (untrained). I have compared multiple models and settings, hoping to shed light on tuning AI agents.