Hacker News new | ask | show | jobs
by koreth1 954 days ago
That's one aspect of reliability, but the one I was more curious about was determinism. If I repeatedly run the same test suite on the same code base and the same data and configuration, am I guaranteed to get the same test results every time, or is it possible for ai() to change its mind about what actions to take?
2 comments

Ah got it. So GPT is non-deterministic, but we somewhat handle that by having a caching layer in our AI. Basically if you make an ai() call, and we see that the page state is identical to a previous invocation of that exact AI prompt, then we will not consult the AI and install return you the cached result. We did this mainly to reduce costs and speed up execution of the 2nd-to-nth run of the same test, but it does make the AI a bit more deterministic.

There are some new features in GPT-4-Turbo that will let us handle determinism better, and we will be exploring that once GPT-4-Turbo is stable.

That makes a lot of sense, thank you for the explanation, I will have to explore this the next time I am building page tests. Have considered doing it myself but much happier using a relatively inexpensive product than maintaining the creaky homebuild version.
Thank you for the clarifying comment, this was really the thing I was meaning when I imprecisely said "reliability".