Y
Hacker News
new
|
ask
|
show
|
jobs
by
GaggiX
929 days ago
>-hidden_dim / dim = 14336/4096 => 3.5X MLP expand
>- n_heads / n_kv_heads = 32/8 => 4X
These two are exactly the same as the old Mistral-7B