|
|
|
|
|
by themulticaster
1707 days ago
|
|
I'm not familiar with the current state of the art language models, so please bear with me for asking: What's the catch here? Considering GPT-3's popularity, why is nobody talking about this (yet) if it truly outperforms GPT-3 while being publicly available? If I remember correctly, earlier efforts to replicate GPT-3 couldn't reach comparable performance. Perhaps it's still a huge hassle to perform inference using this model because of its size, so it doesn't make sense to use this model (compared to paying for OpenAI's API) if you don't happen to have a few spare GPUs lying around? Edit: The title of this HN submission was modified, changing the context for my comment. Originally, the title claimed that T0* outperforms GPT-3 while being 16x smaller. |
|
The paper/model/code was just made public today. This may be why no one is talking about it yet.
Regarding whether the size is a hassle: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API for any model on the Hub: https://api-inference.huggingface.co/docs/python/html/index....