| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by brd 1657 days ago

I really appreciate how accessible SpaCy has made NLP work but their NER is definitely low accuracy.

Where stem/lem felt critical to successful NLP processing a few years ago, we've found stem/lem work to be much less important for downstream tasks when transformer based models are involved.

For topic extraction stem/lem still seems to do a lot to improve accuracy and for rules based approaches I can still see how it would facilitate more efficient processing at scale. I'd be curious to hear your experience fine tuning and/or training new models after stem/lem processing with transformers, we've admittedly done little testing to see how transformers actually performer if properly tuned to post-processed data.

1 comments

artembugara 1657 days ago

Did you try something like autoNLP by huggingface?

link

brd 1657 days ago

No, we've got our own fine tuning pipeline and initial tests showed better performance without traditional stem/lem processing so we dropped it from our classification pipelines and haven't seen a need to revisit.

link