| We’ve been working on Federated Learning (FL) for autonomous agents and hit a hard bottleneck: standard FL (sending gradients back and forth) is bandwidth-heavy and requires massive edge compute. We wrote this paper to propose an architectural shift we call Fluid Federated Learning (FFL). The core engineering contributions are: Prism Protocol: We implemented a "Software-Defined Memory" architecture. It uses io_uring to stream sparse, random projections of model weights directly from NVMe storage to the GPU. This allows us to process "Virtual Batches" of terabyte-scale models on commodity hardware by exploiting the Johnson-Lindenstrauss lemma (Holographic Slicing). Federated State-Space Duality (F-SSD): Instead of averaging gradients (which is slow and leaky), we exploit the duality between Transformers and SSMs (like Mamba) to federate the Recurrent States. The Result: We can run massive foundation models on edge devices with limited VRAM by treating the SSD as a "slow" memory tier without destroying optimization fidelity. I’m curious if anyone here has experimented with io_uring for model serving? We found the async I/O overhead to be negligible compared to the memory gains, but wondering if there are better ways to handle the sparse projections. Happy to answer questions on the implementation. |