| HN Mirror

I think spaCy offers a lot of things to connect the models to the rest of your application.

spaCy's Doc object is pretty helpful for using the outputs, for instance you can iterate over the sentences and then iterate over the entities within each sentence, and look at the tokens within them, or get the dependency children of the words in the entity. The Doc object is backed by Cython data structures, so it's more memory efficient and faster than Python equivalents you'd likely write yourself.

I also think our pipeline stuff is a bit more mature than the one in transformers. The transformers pipeline class is relatively new, so I do think our Language object offers a better developer experience.

I think the new training config and improved train command will also be appealing to people, especially with the projects workflow.

The improved transformers support in v3 is very new, it's only just released in beta form. I do hope people find it useful, but of course no library or solution is ideal for every use-case, so I definitely encourage people to pick the mix of libraries that seems right to them.