|
|
|
|
|
by gilesc
5372 days ago
|
|
NLTK is great for _learning_ NLP, but Python is much too slow for scalable deep NLP (by which I mean tagging and parsing, as opposed to TF-IDF etc). Also parallelization can become a problem because of the GIL. It's a real shame they chose Python actually, because otherwise it's a superbly structured, documented, and maintained project. |
|
Btw for performance, whenever pure Python is indeed "much too slow" (profile?), there's the option of C extension modules. The NumPy or SciPy libraries are good examples: used in hardcore numerical computing aka the epitome of I-NEED-IT-TO-RUN-FAST!, but still Python.
And not to nitpick ;) but GIL only affects multi-threading; other modes of "parallelization" are reasonably straightforward and some even built-in (import multiprocessing).