| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by billythemaniam 1214 days ago
	Flan-T5 is much smaller than GPT-3, but was trained on significantly more data resulting in competitive accuracy. It is also Apache licensed. I wonder if that model is fast enough for enough use cases to make it cost effective?

3 comments

danielbln 1214 days ago

You can give the Xl (3B parameters) model a try here (would recommend a Colab Pro account): https://colab.research.google.com/drive/1Hl0xxODGWNJgcbvSDsD...

In my Colab Pro it's running this on a A100 (which is a very beefy GPU) and inference is very fast and definitely suitable for interactive use. On a T5 GPU (which is much cheaper) inference is still alright and probably ok for interactive use.

link

ntonozzi 1214 days ago

I think Flan-T5 is fast enough, but I don't think it generates text or abstract reasoning at nearly the same level as current GPT-3 models. This indicates a deficiency in the benchmarks and metrics that we use to evaluate LLMs. For generating embeddings it might work well enough though.

link

billythemaniam 1214 days ago

It's certainly not quite as good out of the box, at least the open sourced checkpoints. However so far I found it can achieve similar accuracy with enough examples and/or fine-tuning for my use cases. Like everything, it depends on what are doing too.

link

throwaway1851 1214 days ago

For embeddings, it may be overkill. Smaller BERT-type models can provide good embeddings when fine tuned with a contrastive learning objective. Eg: https://sbert.net.

link

williamcotton 1214 days ago

Fine-tuning on smaller models like GPT-J (also trained on The Pile) worked well for Toolformer:

https://arxiv.org/abs/2302.04761

link