Just in case this question isn't to far out of your way.
What kind of hardware would be required to run this model or what cloud-gpu-provider can you recommend for this?
from @craffel: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API for any model on the Hub: https://api-inference.huggingface.co/docs/python/html/index....