from @craffel: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API for any model on the Hub: https://api-inference.huggingface.co/docs/python/html/index....