| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nowittyusername 197 days ago
	My man did you even check my video, did you even try the app. This is not "bug related" nowhere did i say it was a bug. Batch processing is a FEATURE that is intentionally turned on in the inference engine for large scale providers. That does not mean it has to be on. If they turn off batch processing al llm api calls will be 100% deterministic but it will cost them more money to provide the services as now you are stuck with providing 1 api call per GPU. "if I hit the black box twice, I get two different replies" what you are saying here is 100% verifiably wrong. Just because someone chose to turn on a feature in the inference engine to save money does not mean llms are anon deterministic. LLM's are stateless. their weights are froze, you never "run" an LLM, you can only sample it. just like a hologram. and depending on the inference sampling settings you use is what determines the outcome.....

1 comments

pegasus 197 days ago

Correct me if I'm wrong, but even with batch processing turned off, they are still only deterministic as long as you set the temperature to zero? Which also has the side-effect of decreasing creativity. But maybe there's a way to pass in a seed for the pseudo-random generator and restore determinism in this case as well. Determinism, in the sense of reproducible. But even if so, "determinism" means more than just mechanical reproducibility for most people - including parent, if you read their comment carefully. What they mean is: in some important way predictable for us humans. I.e. no completely WTF surprises, as LLMs are prone to produce once in a while, regardless of batch processing and temperature settings.

link

nowittyusername 197 days ago

You can change ANY sampling parameter once batch processing is off and you will keep the deterministic behavior. temperature, repetition penalty, etc.... I got to say I'm a bit disappointed in seeing this in hacker news, as I expect this from reddit. you bring the whole matter on a silver platter, the video describes in detail how any sampling parameter can be used, i provide the whole code opensource so anyone can try it themselves without taking my claims as hearsay, well you can bring a horse to water as they say....

link