One idea is that the cause is batched inference in sparse MoE (mixture of experts) models.
https://152334h.github.io/blog/non-determinism-in-gpt-4/
HN discussion: https://news.ycombinator.com/item?id=37006224