|
|
|
|
|
by darrenf
63 days ago
|
|
> > The performance/intelligence is said to be about the same as the geometric mean of the total and active parameter counts. So, this model should be equivalent to a dense model with about 10.25 billion parameters. > Sorry, how did you calculate the 10.25B? The geometric mean of two numbers is the square root of their product. Square root of 105 (35*3) is ~10.25. |
|