|
|
|
|
|
by ingenieroariel
814 days ago
|
|
TLDR: A model that could be described as "3.8 level" that is good at math and openly available with a custom license. It is as fast as 34B model, but uses as much memory as a 132B model. A mixture of 16 experts, activates 4 at a time, so has more chances to get the combo just right than Mixtral (8 with 2 active). For my personal use case (a top of the line Mac Studio) it looks like the perfect size to replace GPT-4 turbo for programming tasks. What we should look out for is people using them for real world programming tasks (instead of benchmarks) and reporting back. |
|