|
|
|
|
|
by marcelmarais
789 days ago
|
|
With Llama 3 & Phi-3 just being released and achieving incredible benchmark results it makes sense. However, Deepmind is doing some really cool experiments with different architectures. Recently they have applied the Griffin architecture to Gemma:
- Griffin combines gated linear recurrences with local attention to optimize performance on long sequences.
- This achieves comparable results to larger models with far fewer training tokens.
- During inference, Griffin provides MUCH higher throughput compared to Transformer based models |
|