|
|
|
|
|
by aesthesia
8 days ago
|
|
LLM inference can be implemented in a way where nondeterminism depends only on the random seed, but that's not common. It ends up being more efficient/easier to implement kernels whose exact results depend on how many other prompts are being processed in parallel. See https://thinkingmachines.ai/blog/defeating-nondeterminism-in... for a pretty extensive exploration. |
|