Hacker News new | ask | show | jobs
by AndyNemmity 14 hours ago
This is why I Blind A/B test everything.

I burn a ton of tokens, but things actually have to prove their value. And the vast majority of things do not come close to doing so.

I have my own AI agent full of stuff. I blind A/B test everything, but I also don't think the results are all that useful as a signal to others.

Just because I Blind A/B test it 4 months ago, it's maybe not meaningful today.

Maybe the word choices I use dramatically impact things.

I do it, because I can prove the value, and see it with my own eyes. I don't even bother publishing the specific Blind A/B tests.

Also, I've seen other people try to Blind A/B test and get it very wrong. If your measurements aren't good, the test is meaningless.

I don't know. We're all working on these problems together. There's a lot of black magic (which is why I rely on hooks a lot). I'm sure I have tons of black magic, I have a large little AI Agent.

But what I know for certain, is it works for me. All it takes is for me to not use it, and I honestly don't know how everyone currently works with AI.

I will link it, but it is not an endorsement for what you do. Mostly only other software engineers use it. And it's so very specific to the things I have to do.

At best, maybe it sparks an idea for you to implement on your own.

https://github.com/notque/vexjoy-agent