Hacker News new | ask | show | jobs
by pfekin_2nd 273 days ago
Summation-based aggregation replaces pairwise similarity with position-modulated projections and direct summation, reducing per-layer cost from quadratic to near-linear.

On its own, summation is competitive for classification and multimodal tasks. In language modeling, a hybrid design — summation in most layers with a single final attention layer — matches or slightly outperforms full attention while staying nearly linear in cost.

GitHub: https://github.com/pfekin/summation-based-transformers