Hacker News new | ask | show | jobs
by syllogism 3145 days ago
Current discussion: https://github.com/explosion/spaCy/issues/1508

I'm getting around 8k words per second on the smallest Google Cloud instances. You couldn't run spaCy 1 on these instances (or on AWS lambda) due to memory usage problems, especially problems predicting memory usage for long-running processes. This is why we say spaCy 2 is cheaper to run in a cents-per-word sense than spaCy 1. This is the performance measure that we think is most important.

However, users are still reporting performance problems, so I wouldn't call the issue resolved. spaCy 1 managed to avoid depending on numpy during prediction, making it easy to ensure that performance didn't depend on anyone's environment. spaCy 2 currently does use numpy, introducing these questions around configuration. I'm working to fix this by implementing the forward pass entirely in Cython.