| HN Mirror

Issue #1 requires batch-invariant kernels. They're already implemented in some inference engines with negligible performance loss, e.g. DeepSeek v4.

For the issue #2, identical outputs are impossible to guarantee for semantical variations, by definition. If you reduce your requirements to the acceptable difference between model's and human's understanding of the prompt, then it becomes a solvable (but not solved) training policy issue. Moreover, it doesn't necessarily need to be solved, because retrying with a slightly different prompt is a thing you don't want to do in the first place.