Y
Hacker News
new
|
ask
|
show
|
jobs
by
jiuren
2800 days ago
The big picture is similar. But ULMfit uses amd-lstm for the language modeling, bert uses masked LM instead. Bert has some other tricks like sentence prediction as well.