| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pqwEfkvjs 3145 days ago
	Kudos to Matthew, Ines and others making this possible. I haven't checked it out myself yet, so I wanted to ask that are the performance issues fixed that were haunting the 2.0 alpha version?

2 comments

syllogism 3145 days ago

Current discussion: https://github.com/explosion/spaCy/issues/1508

I'm getting around 8k words per second on the smallest Google Cloud instances. You couldn't run spaCy 1 on these instances (or on AWS lambda) due to memory usage problems, especially problems predicting memory usage for long-running processes. This is why we say spaCy 2 is cheaper to run in a cents-per-word sense than spaCy 1. This is the performance measure that we think is most important.

However, users are still reporting performance problems, so I wouldn't call the issue resolved. spaCy 1 managed to avoid depending on numpy during prediction, making it easy to ensure that performance didn't depend on anyone's environment. spaCy 2 currently does use numpy, introducing these questions around configuration. I'm working to fix this by implementing the forward pass entirely in Cython.

link

pqwEfkvjs 3145 days ago

Found the answer myself from the release docs: > The Language.pipe method allows spaCy to batch documents, which brings a significant performance advantage in v2.0. The new neural networks introduce some overhead per batch, so if you're processing a number of documents in a row, you should use nlp.pipe and process the texts as a stream.

So if you have an event based system where you can process only a single document at once, it does not make sense to upgrade yet, because for a single document case the runtime performance was 10x-100x slower, at least with 2.0 alpha version.

link

syllogism 3145 days ago

But with a nice caveat: In an event-based system, you can run spaCy 2 with AWS Lambda :). This will be much cheaper than keeping a server warm.

link