Hacker News new | ask | show | jobs
by visarga 272 days ago
It's because batch size is dynamic. So a different batch size will change the output even on temp 0.
1 comments

Batch size is dynamic, in MoE apparently the experts chosen depend on the batch (not only your single inference request, which sounds weird to me, but I'm just an end user), no one audited the inference pipeline for floating point nondeterminisms, and I'm not even sure that temperature 0 implies deterministic sampling (the quick math formula I found has e^(1/temp) which means that 0 is not a valid value anyways and would need some dealing with).