| Sorry you lost time on this! We took a long time to get Thinc documented and stable, because there was a long period where I wasn't sure where I wanted the library to go. The deep learning ecosystem in 2018 was pretty hard to predict, and we didn't want to encourage spaCy users to adopt Thinc as their machine learning code if we weren't sure what its status would be. So we actually never really got Thinc v7 stablised and documented. This actually became a real issue in the previous version of spacy-transformers. It meant we were pushed into a design for spacy-transformers that really didn't work well. The library wasn't flexible enough, because there was no good way to interact with the transformers at the modelling level. Pretrained transformers are interesting from an API perspective because you really don't want to put the neural network in a box behind a higher-level API. You can use the intermediate representations in many different ways, so long as you can backprop to them. So you want to expose the neural networking. Thinc v8 was redesigned and finally documented earlier this year: https://thinc.ai . We now have a clear vision for the library: you can write your models in the library of your choice and easily wrap them in Thinc, so spaCy isn't limited to one particular library. For spaCy's own models, we try to implement them in "pure Thinc" rather than a library like PyTorch or Tensorflow, to keep spaCy itself lightweight (and to stop you from having to juggle competing libraries at the same time). So, it's not quite true that we removed the docs for Thinc v7. We actually didn't have a good solution to do the things you needed to do in the previous spacy-transformers, which prompted a big redesign. |
Yeah I was trying to do something that didn't quite fit with the spacy-transformers API at the time. I did get a bit of a headache trying to use thinc at the time, which was just when you guys did the redesign I think, so the docs were different from what I was seeing. I might not have searched enough though.
I didn't try it yet, but it seems that transformers got added to spacy v3 with first class support.
I did gain something from rummaging though spacy source though! NN layers were composed into module-like pieces, then added to this REGISTRY variable though a decorator. That way some things could be defined at runtime. It was super elegant.
I nicked the concept of that for my data preprocessing pipeline. Saved me a lot of time when trying new things.