Hacker News new | ask | show | jobs
by artembugara 1659 days ago
Also I have an article about spaCy NER: https://newscatcherapi.com/blog/named-entity-recognition-wit...

The conclusion I came up with:

"A few notes on my Spacy NER accuracy with "real world" data

Low accuracy with sentences without a proper casing

1. Low accuracy overall, even with a large model

2. You'd need to fine-tune your model if you want to use it in production

3. Overall, there's no open-source high accuracy NER model that you can use out-of-a-box"

2 comments

> Overall, there's no open-source high accuracy NER model that you can use out-of-a-box"

Part of it is most underestimate the complexity of NER and the rest of it, in my opinion, is that NER is not well-defined as a classification problem.

At least in my experience, having a specific battery of questions to query documents, first by transformer based semantic search and narrowed by Q/A models, removed the need for explicit NER, entity linking or relation extraction. For the case of entities as features for rule systems, shallow models and using all label predictions instead of just selecting argmax has been sufficiently robust. Using big transformers for classification doesn't pay enough to be worth it there.

I assume your product does some kind of entity disambiguation and/or link to an ontology? Spacy doesn't provide this out of the box either, AFAICT. Can you share more info about how you do it?
We don't provide entity disambiguation out of a box. It's more of a on request for Enterprise clients.

But overall, entity disambiguation is one of the most useful and difficult tasks in the NLP.

SpaCy supports entity linking via knowledge base: https://spacy.io/api/entitylinker

That might be the killer feature from what I've heard.
NER good enough to anonymise free text would be the absolute dream for many governments.