Hacker News new | ask | show | jobs
by sinenomine 1248 days ago
The model is large and every instance likely (not sure about the absolute degree they optimized the model) requires several GPUs (or high-grade accelerators) to run at a moderate speed.

Read the papers.