| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by craffel 1747 days ago

(author here)

The paper/model/code was just made public today. This may be why no one is talking about it yet.

Regarding whether the size is a hassle: It's possible to run inference on a single Google Cloud TPU v3-8 device or on a server with 4x 32GB v100 GPUs. Hugging Face also has an inference API for any model on the Hub: https://api-inference.huggingface.co/docs/python/html/index....

3 comments

NavinF 1747 days ago

Do you have (rough) numbers for inference latency on 4x 32GB v100?

link

VictorSh 1747 days ago

(author here)

I don't have exact numbers for latency but the inference widget is currently on a TPU v3-8 (which if I am not mistaken could roughly be compared to a cluster of 8 V100). That gives you a rough idea of the latency for short inputs.

Note that a colleague just reminded me that it is possible on a single (big) GPU with enough CPU to run inference for T5-11B (which is the size we use) with offloading -> https://github.com/huggingface/transformers/issues/9996#issu...

link

ourlordcaffeine 1747 days ago

On the topic of GPT-3, I asked your creation:

"Who is better, you or GPT-3?"

> GPT-3

link

ai_ia 1747 days ago

It somehow picked up Modesty.

link

echelon 1747 days ago

Can this be used to generate prose at length? Or Reddit comment replies?

link

srush 1747 days ago

While in theory it could, the nature of its training favors shorter more factual replies.

link