| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ingenieroariel 814 days ago

TLDR: A model that could be described as "3.8 level" that is good at math and openly available with a custom license.

It is as fast as 34B model, but uses as much memory as a 132B model. A mixture of 16 experts, activates 4 at a time, so has more chances to get the combo just right than Mixtral (8 with 2 active).

For my personal use case (a top of the line Mac Studio) it looks like the perfect size to replace GPT-4 turbo for programming tasks. What we should look out for is people using them for real world programming tasks (instead of benchmarks) and reporting back.

1 comments

sp332 814 days ago

What does 3.8 level mean?

link

ingenieroariel 814 days ago

My interpretation:

- Worst case: as good as 3.5 - Common case: way better than 3.5 - Best case: as good as 4.0

link

ljlolel 814 days ago

Gpt-3.5 and gpt-4

link