Y
Hacker News
new
|
ask
|
show
|
jobs
by
miven
789 days ago
Any guesses as to why they bumped the parameter count up from 7B to 8B?
1 comments
matrix2596
789 days ago
I also wondered the same and check the model configs. they are using bigger vocab size and the intermediate size of fully connected layer seems to be bigger.
link