Hacker News new | ask | show | jobs
by ingenieroariel 814 days ago
TLDR: A model that could be described as "3.8 level" that is good at math and openly available with a custom license.

It is as fast as 34B model, but uses as much memory as a 132B model. A mixture of 16 experts, activates 4 at a time, so has more chances to get the combo just right than Mixtral (8 with 2 active).

For my personal use case (a top of the line Mac Studio) it looks like the perfect size to replace GPT-4 turbo for programming tasks. What we should look out for is people using them for real world programming tasks (instead of benchmarks) and reporting back.

1 comments

What does 3.8 level mean?
My interpretation:

- Worst case: as good as 3.5 - Common case: way better than 3.5 - Best case: as good as 4.0

Gpt-3.5 and gpt-4