|
|
|
|
|
by sanchezdev
2410 days ago
|
|
I don't disagree with your point, but the unsupervised aspect of NLP typically isn't useful on its own. Usually it's a form of pre-training to help supervised models perform better with less data. From Google in 2018: "One of the biggest challenges in natural language processing (NLP) is the shortage of training data. Because NLP is a diversified field with many distinct tasks, most task-specific datasets contain only a few thousand or a few hundred thousand human-labeled training examples. However, modern deep learning-based NLP models see benefits from much larger amounts of data, improving when trained on millions, or billions, of annotated training examples. To help close this gap in data, researchers have developed a variety of techniques for training general purpose language representation models using the enormous amount of unannotated text on the web (known as pre-training). The pre-trained model can then be fine-tuned on small-data NLP tasks like question answering and sentiment analysis, resulting in substantial accuracy improvements compared to training on these datasets from scratch." |
|
The unsupervised aspect is the engine driving all modern NLP advancements. Your comment suggests that it is incidental, which is far from the case. Yes, it is often ultimately then used for a downstream supervised task, but it wouldn't work at all without unsupervised training.
Indeed, one of the biggest applications of deep NLP in recent times, machine translation, is (somewhat arguably) entirely unsupervised.