"We demonstrate the importance of bidirectional
pre-training for language representations". can some one help me understand what bidirectional and pre-trained means?
* bidirectional - build representations of the current word by looking into both the future and the past
* pre-trained - train on lots of language modelling data (e.g. billions of words of wikipedia) and then train on the task you really care about but starting from the parameters learnt from the language modelling task.
* pre-trained - train on lots of language modelling data (e.g. billions of words of wikipedia) and then train on the task you really care about but starting from the parameters learnt from the language modelling task.