|
|
|
Show HN: Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention
|
|
1 points
by bootstraptor
112 days ago
|
|
I’m excited to release Sovereign-Lila-E8, a novel transformer architecture that replaces standard attention mechanisms with a native E8 Root System Lattice.
While the industry is brute-forcing intelligence with trillions of parameters, I went "outside" the system to find a zero-viscosity solution. I built Sovereign-Lila-E8 because I wanted to see if we could bypass the 'viscosity' of standard attention mechanisms using higher-dimensional geometry. Most small models today are just distilled copies of larger ones. LILA-E8 is different: it implements a native E8 Root System Lattice directly into the attention weights. By using the densest sphere packing in 8 dimensions, we minimize semantic friction (information loss) in the latent space. The Results: Efficiency: 40M parameters achieving 0.37 Train / 0.44 Val Loss on the TinyStories dataset (outperforming standard 60M baselines).
Stability: Sustained coherence for 1000+ tokens without the common semantic looping seen in small-scale transformers.
By implementing the E8 exceptional Lie algebra directly into the attention weights, I’ve achieved a state of "Geometric Resonance" that standard transformers simply cannot reach.
At 200,000 steps, the model achieved a state of 'Geometric Resonance'—a phase shift in quality that typically requires 2-3x more parameters in standard architectures.
I’ve provided a 1-click Google Colab for instant verification of the weights and generation quality.
GitHub: https://github.com/SPUTNIKAI/sovereign-lila-e8
Colab: https://colab.research.google.com/github/SPUTNIKAI/sovereign...
Zenodo: (Preprint): https://zenodo.org/records/18731736 Looking for feedback on expanding the context window to 4096 and potentially porting this to the 24D Leech Lattice. (see also https://zenodo.org/records/18729723 ) |
|