Hacker News new | ask | show | jobs
by MiroF 2405 days ago
As I said, I'm an NLP researcher and practitioner, so you don't need to quote this at me.

The unsupervised aspect is the engine driving all modern NLP advancements. Your comment suggests that it is incidental, which is far from the case. Yes, it is often ultimately then used for a downstream supervised task, but it wouldn't work at all without unsupervised training.

Indeed, one of the biggest applications of deep NLP in recent times, machine translation, is (somewhat arguably) entirely unsupervised.

1 comments

I didn't mean to make it sound incidental although I do see your point. Just wanted to chime in with how important having a labeled dataset is for a successful ML project.
I think the point is labeling itself is very difficult except for special and limited domains. Manually constructed labels, like feature engineering, are not robust and do not advance the field in general.
That makes sense. I'm coming from the angle of applied ML where solutions need to solve a business problem rather than advance the field of ML. In consulting many problems can't be solved well without a labeled dataset and in lieu of one, less credible data scientists will claim they can solve it in an unsupervised manner.
For sure. There are counter-examples however - fully unsupervised machine translation for resource poor languages comes to mind and is increasingly getting business applications.

I think that in the future, more and more clever unsupervised approaches will be the path forward in huge AI advances. We've essentially run out of labeled data for a large variety of tasks.