Show HN: Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention

Y	Hacker News new \| ask \| show \| jobs

1 points by bootstraptor 112 days ago

I’m excited to release Sovereign-Lila-E8, a novel transformer architecture that replaces standard attention mechanisms with a native E8 Root System Lattice. While the industry is brute-forcing intelligence with trillions of parameters, I went "outside" the system to find a zero-viscosity solution.

I built Sovereign-Lila-E8 because I wanted to see if we could bypass the 'viscosity' of standard attention mechanisms using higher-dimensional geometry.

Most small models today are just distilled copies of larger ones. LILA-E8 is different: it implements a native E8 Root System Lattice directly into the attention weights. By using the densest sphere packing in 8 dimensions, we minimize semantic friction (information loss) in the latent space.

The Results:

Efficiency: 40M parameters achieving 0.37 Train / 0.44 Val Loss on the TinyStories dataset (outperforming standard 60M baselines). Stability: Sustained coherence for 1000+ tokens without the common semantic looping seen in small-scale transformers. By implementing the E8 exceptional Lie algebra directly into the attention weights, I’ve achieved a state of "Geometric Resonance" that standard transformers simply cannot reach. At 200,000 steps, the model achieved a state of 'Geometric Resonance'—a phase shift in quality that typically requires 2-3x more parameters in standard architectures. I’ve provided a 1-click Google Colab for instant verification of the weights and generation quality. GitHub: https://github.com/SPUTNIKAI/sovereign-lila-e8 Colab: https://colab.research.google.com/github/SPUTNIKAI/sovereign... Zenodo: (Preprint): https://zenodo.org/records/18731736

Looking for feedback on expanding the context window to 4096 and potentially porting this to the 24D Leech Lattice. (see also https://zenodo.org/records/18729723 )