Y
Hacker News
new
|
ask
|
show
|
jobs
by
matrix2596
788 days ago
I also wondered the same and check the model configs. they are using bigger vocab size and the intermediate size of fully connected layer seems to be bigger.