|
With TRL, it's now straightforward to RL-finetune LLMs, but picking good reward functions is still the weakest link. Zeno is an open-source toolkit for verifiable, deterministic reward functions for RL on LLMs. While the initial release focuses on Python code generation, the goal is broader: make RL reward design for LLMs transparent, modular, and extendable across domains (math, retrieval, reasoning, tool-use, etc.) What's in Zeno for now?
- Auditable, stateless reward functions for Python code - docstrings, ruff linting, type hints, recursion, and more
- Works directly with Huggingface's TRL or any RL loop - plug reward functions in as needed.
- MIT licensed and minimal. Roadmap:
Python code is just the starting point. Extensions for math problem solving, planning and agentic behaviors are in todo. Repo: https://github.com/think-a-tron/zeno Docs and more details in the README Comments, critiques, and real-world use cases encouraged, especially if you want to push beyond code. |