| Hey HN, I built triplecheck because I wanted deep AI code review without paying $24/mo per seat. The idea: instead of one LLM pass that drops comments (like CodeRabbit/Sourcery), triplecheck runs a full loop: 1. Reviewer finds bugs → structured findings with file, line, severity
2. Coder writes actual patches (search/replace diffs, not suggestions)
3. Tests run automatically to catch regressions
4. Loop until no new findings or max rounds
5. Judge scores the final result 0–10 The key insight: with local LLMs, compute is free, so you can afford to be thorough. Run 5 review passes from different angles, vote to filter noise, let the coder fix everything, and re-review until clean. Try doing that with a $0.03/1K token API. What works well:
- Qwen3-Coder on vLLM/Ollama handles reviewer + coder surprisingly well
- Multi-pass voting genuinely reduces false positives — 3 passes agreeing > 1 pass guessing
- Tree-sitter dependency graph means the reviewer sees related files together, not random batches
- Scanned a 136K-line Go codebase (70 modules) — found real bugs, not just style nits What's missing (honest):
- No GitHub PR integration yet (CLI only — you run it, read the report). This is the #1 gap vs CodeRabbit. It's on the roadmap.
- No incremental/diff-only review — it reviews whole files. Fine for local LLMs (free), wasteful for cloud APIs.
- Local LLMs still hallucinate fixes sometimes. The test gate catches most of it, but you should review the diff before merging. Stack: Python, Click CLI, any OpenAI-compatible backend. Works with vLLM, Ollama, LM Studio, DeepSeek, OpenRouter, Claude CLI. Mix and match — e.g. local Qwen running on M3 Ultra for reviewer/coder + Claude for judge. Would love feedback, especially from anyone running local models for dev tools. What review capabilities would make you actually use this in your workflow? |