|
|
|
|
|
by garysieling
4785 days ago
|
|
Yes, that is certainly true. The other issue with the technique I see is if I tried to scale this I'd probably hit some maturity issues with these libraries. For what it's worth, it looks like DocumentCloud uses Open Calais, which is a Thomson Reuters product - I used to work there in a different division, they have a bunch of interesting products in this space. |
|
I notice your blog is filled with NLP related goodies. I've been meaning to screw around with Stanford NER lib, to see if i can train up some custom recognizers for particular document domains of any utility.