|
|
|
|
|
by leod
2619 days ago
|
|
Thank you! The model weighs in at 1.2GB with 100M parameters, which is similar to the smallest GPT-2 model. I wouldn't be suprised if GPT-2 small (+ finetuning on HN data) performed better than what I have trained. Other than hyperparameters, I think there are two main differences:
First, I pretrained the model solely on Wikipedia data, while GPT-2 used more general web data. Second, I used an encoder-decoder model, while GPT-2 is a language model. I'm suspecting that the encoder is not very useful for this task. |
|