|
|
|
|
|
by kkouddous
1460 days ago
|
|
We’ve been trying to implement an active-learning retraining loop for our critical NLP models for Koko but have never found the time to prioritize the work as it was multi-sprint level of effort. We’ve been working with them for the for a few weeks and we and we are seeing meaningful performance improvement with our models. I highly recommend trying them out. |
|
Consider simple language model case. In order to learn some specific phrases you need to see them in the training, and phrases of interest are rare (usually 1-2 cases per terabytes of data). You simply can not select a half.
A semi-supervised learning and self-supervised learning are more reasonable and widely used. You still consider all the data for training. You just don't annotate it manually.