Hacker News new | ask | show | jobs
by captn3m0 410 days ago
Wouldn’t any randomness (for a fixed combination of hardware and weights) be a result of the temperature and any randomness inserted at inference-time?

Otherwise, doing a H/T comparison is just a proxy to what the underlying token probabilities are and the temperature configuration (+hardware differences for a remote-hosted model).

3 comments

Author here. Yeah totally agreed. The more rigorous way to do this would be to use a fixed seed and temp and in a local model setting and then sample the logprobs and then analyse that data.

I had an hour to kill and did this experiment.

Congratulations, this was all a test to see if there were anyone on HN with any knowledge of how LLMs work, and you gave the correct answer.
I was gonna say floating point errors might contribute especially at fp16 and fp8, but those are technically deterministic.