Hacker News new | ask | show | jobs
by CompoundEyes 480 days ago
I put in a code reviewer that runs and comments when a pull request is created using Github actions and Microsoft GenAIScript. It's pretty straightforward. The key thing is we have total control over the prompt to fit our repo and devs needs, can make it multi-stage and deterministic using Typescript code or use agents in GenAIScript to open adjacent files for more context. The value we've received is that a dev can look over the review to catch anything they might have missed and make changes all before another dev looks at it. That saves time. I've seen devs open draft pull requests to get preliminary feedback on work in progress. The reviewer script is versioned with the repo. Currently using a mix of gpt-4o and gpt-4o-mini in parts of the script to do smaller tasks.
2 comments

I’d be interested in seeing the scripts if you are able to share (redacted) versions of them
I can't do that at the moment but the GenAIScript project repo has a good plug and play version that I built upon. In my opinion agent Typescript classes that return structured data tied to interfaces/types with their own memory is where it's at. Also been experimenting with a state machine class to hot potato the output between an agent that judges the results and the worker agent until satisfied.
How do you make it deterministic?
Sorry I meant that it's javascript / typescript so we can deterministically orchestrate a series of prompts and shape their output exactly how we'd like. Returning the review as structured output as a JSON object is very helpful for this. If the review result seems bungled, run a judge prompt at the end and tell it to go try again ^_^.
That makes sense. You could even run against multiple models or future models then, right? I can see some value in that because maybe 2 years from now the models will be able to surface issues that weren’t detected originally. I suppose you could run against the whole codebase in the future, but could also imagine something that could track down where a bug was introduced.

Do you save the reviews or discard them?

Not presently saving the reviews but we could go back and export the DIFF text for two commit hashes with git to recreate what was reviewed in a CI pipeline. This is also handy for tweaking the prompt towards what you're expecting to see on local machine.

I agree the script should be versioned because the output of the one model to the next varies so much and the potential that even an updated version of one model could break it. Treating model versions like an npm package dependency almost.