|
|
|
|
|
by varshith17
172 days ago
|
|
You are absolutely right. GPU parallelism (especially reduction ops) combined with floating-point non-associativity means the same model can produce slightly different embeddings on different hardware. However, that makes deterministic memory more critical, not less. Right now, we have 'Double Non-Determinism': The Model produces drifting floats. The Vector DB (using f32) introduces more drift during indexing and search (different HNSW graph structures on different CPUs). Valori acts as a Stabilization Boundary. We can't fix the GPU (yet), but once that vector hits our kernel, we normalize it to Q16.16 and freeze it. This guarantees that Input A + Database State B = Result C every single time, regardless of whether the server is x86 or ARM. Without this boundary, you can't even audit where the drift came from. |
|