Hacker News new | ask | show | jobs
by jiuren 2800 days ago
The big picture is similar. But ULMfit uses amd-lstm for the language modeling, bert uses masked LM instead. Bert has some other tricks like sentence prediction as well.