Hacker News new | ask | show | jobs
by pqwEfkvjs 3146 days ago
Found the answer myself from the release docs: > The Language.pipe method allows spaCy to batch documents, which brings a significant performance advantage in v2.0. The new neural networks introduce some overhead per batch, so if you're processing a number of documents in a row, you should use nlp.pipe and process the texts as a stream.

So if you have an event based system where you can process only a single document at once, it does not make sense to upgrade yet, because for a single document case the runtime performance was 10x-100x slower, at least with 2.0 alpha version.

1 comments

But with a nice caveat: In an event-based system, you can run spaCy 2 with AWS Lambda :). This will be much cheaper than keeping a server warm.