|
|
|
|
|
by knowtheory
4777 days ago
|
|
This is pretty cool, but the fundamental problem is still that you (or someone else) have to load an entire PDF (or set of PDFS) before you can use the full text indexing to search it. If you're running a service (say like DocumentCloud) you're way better off precomputing a full text index on ingest and providing a search API than shunting over substantial parts of your stored documents. Definitely cool as a piece of gear, but not terribly practical from a client-side perspective i'd think. |
|
For what it's worth, it looks like DocumentCloud uses Open Calais, which is a Thomson Reuters product - I used to work there in a different division, they have a bunch of interesting products in this space.