Hacker News new | ask | show | jobs
Show HN: I built a small OSS kernel for replaying and diffing AI decisions
1 points by koistya 140 days ago
I’ve been hacking on a small open-source project called Verist and figured I’d share it here to get some early feedback.

What kept bothering me with AI features in production wasn’t really how to build them, but everything that comes after: explaining why something happened, reproducing it weeks later, or changing prompts/models without breaking things in subtle ways.

Logs helped a bit, but not enough. Agent frameworks felt too implicit for my taste. And model upgrades were honestly scary, outputs would change and it wasn’t always obvious where or why.

So I ended up building a very small, explicit kernel where each AI step can be replayed, diffed, and reviewed. Think something like Git-style workflows for AI decisions, but without trying to be a framework or a runtime.

It’s not an agent framework, not a chat UI, and not a platform, just a TypeScript library focused on explicit state, audit events, and replay + diff.

Repo: https://github.com/verist-ai/verist

I’m especially curious if others here have run into similar issues shipping AI features to prod, or if this feels like overkill. Happy to answer questions or hear criticism.

1 comments

I solved this with Dagger and OCI layers, so I get all these same features while also doing so in an isolated environment in a ubiquitous format. The ADK framework I use does most of the rest

Not a prod time thing, unless you consider a coding agent using this as prod, because it's the real thing at development time

That's a really cool approach, love it! Treating agent steps as build artifacts (OCI layers) makes total sense for coding agents or offline evaluations where you need filesystem-level reproducibility.

Verist, on the other hand, is aiming for the application layer: lightweight, low-latency observability for production user-facing apps where spinning up containers per-step isn't feasible. I think there's space for both: Dagger/OCI for heavy 'environment' replay, and Verist for semantic 'decision' replay.