Hacker News new | ask | show | jobs
by matrix2596 788 days ago
I also wondered the same and check the model configs. they are using bigger vocab size and the intermediate size of fully connected layer seems to be bigger.