|
|
|
|
|
by bisonbear
31 days ago
|
|
I've been building a tool to do this - build a dataset based on tasks from your repo, then A/B test the agent with whatever change you're making to determine the impact prior to actually shipping it. If you want to check it out - stet.sh |
|