Hacker News new | ask | show | jobs
by themulticaster 1707 days ago
I'm not familiar with the current state of the art language models, so please bear with me for asking: What's the catch here? Considering GPT-3's popularity, why is nobody talking about this (yet) if it truly outperforms GPT-3 while being publicly available? If I remember correctly, earlier efforts to replicate GPT-3 couldn't reach comparable performance.

Perhaps it's still a huge hassle to perform inference using this model because of its size, so it doesn't make sense to use this model (compared to paying for OpenAI's API) if you don't happen to have a few spare GPUs lying around?

Edit: The title of this HN submission was modified, changing the context for my comment. Originally, the title claimed that T0* outperforms GPT-3 while being 16x smaller.

4 comments

(author here)

The paper/model/code was just made public today. This may be why no one is talking about it yet.

Regarding whether the size is a hassle: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API for any model on the Hub: https://api-inference.huggingface.co/docs/python/html/index....

Do you have (rough) numbers for inference latency on 4x 32GB v100?
(author here)

I don't have exact numbers for latency but the inference widget is currently on a TPU v3-8 (which if I am not mistaken could roughly be compared to a cluster of 8 V100). That gives you a rough idea of the latency for short inputs.

Note that a colleague just reminded me that it is possible on a single (big) GPU with enough CPU to run inference for T5-11B (which is the size we use) with offloading -> https://github.com/huggingface/transformers/issues/9996#issu...

On the topic of GPT-3, I asked your creation:

"Who is better, you or GPT-3?"

> GPT-3

It somehow picked up Modesty.
Can this be used to generate prose at length? Or Reddit comment replies?
While in theory it could, the nature of its training favors shorter more factual replies.
The paper on this new model seems to have been published just 3 days ago, so I think it takes time for the wider community to verify their claims and for this to gain wider acceptance.
Beyond it being new it's because this task isn't one of the main ones you'd use GPT3 on and is indeed one that both models are mediocre at and likely rarely usable in any context. The title is just a tad misleading.*

Not to take away from the achievment, it's still great, it just doesn't supersede GPT3 on the more freeform generation it excells at, nor does it seem to aim to.

* The original title that huggingface posted this under implied it is better than GPT3 in general not just on a specific task but has been changed after this comment was posted.

You can run it right now with your own queries: see https://twitter.com/abidlabs/status/1450118978051903488