Hacker News new | ask | show | jobs
by leod 2619 days ago
Thank you!

The model weighs in at 1.2GB with 100M parameters, which is similar to the smallest GPT-2 model.

I wouldn't be suprised if GPT-2 small (+ finetuning on HN data) performed better than what I have trained. Other than hyperparameters, I think there are two main differences: First, I pretrained the model solely on Wikipedia data, while GPT-2 used more general web data. Second, I used an encoder-decoder model, while GPT-2 is a language model. I'm suspecting that the encoder is not very useful for this task.