Hacker News new | ask | show | jobs
by brd 1657 days ago
I really appreciate how accessible SpaCy has made NLP work but their NER is definitely low accuracy.

Where stem/lem felt critical to successful NLP processing a few years ago, we've found stem/lem work to be much less important for downstream tasks when transformer based models are involved.

For topic extraction stem/lem still seems to do a lot to improve accuracy and for rules based approaches I can still see how it would facilitate more efficient processing at scale. I'd be curious to hear your experience fine tuning and/or training new models after stem/lem processing with transformers, we've admittedly done little testing to see how transformers actually performer if properly tuned to post-processed data.

1 comments

Did you try something like autoNLP by huggingface?
No, we've got our own fine tuning pipeline and initial tests showed better performance without traditional stem/lem processing so we dropped it from our classification pipelines and haven't seen a need to revisit.