Hacker News new | ask | show | jobs
by varshith17 172 days ago
You are absolutely right. GPU parallelism (especially reduction ops) combined with floating-point non-associativity means the same model can produce slightly different embeddings on different hardware.

However, that makes deterministic memory more critical, not less.

Right now, we have 'Double Non-Determinism':

The Model produces drifting floats.

The Vector DB (using f32) introduces more drift during indexing and search (different HNSW graph structures on different CPUs).

Valori acts as a Stabilization Boundary. We can't fix the GPU (yet), but once that vector hits our kernel, we normalize it to Q16.16 and freeze it. This guarantees that Input A + Database State B = Result C every single time, regardless of whether the server is x86 or ARM.

Without this boundary, you can't even audit where the drift came from.