|
|
|
|
|
by randomgermanguy
212 days ago
|
|
Oh actually yeah that's true. You have correctly out-nitpicked my nitpick lol. But at that point i feel like we are getting close to "everything that isn't a perfect Turing-machine is somewhat-stochastic" ;) Edit: someone corrected me above, it does seem to matter more then I thought |
|
if you llm agent takes different decisions from the same prompt, then you have to deal with it
1) your benchmarks become stochastic so you need multiple samples to get confidence for your AB testing
2) if your system assumes at least once completion you have to implement single record and replay so you dont get multiple rollout of with different actions