Hacker News new | ask | show | jobs
by miven 789 days ago
Any guesses as to why they bumped the parameter count up from 7B to 8B?
1 comments

I also wondered the same and check the model configs. they are using bigger vocab size and the intermediate size of fully connected layer seems to be bigger.