Hacker News new | ask | show | jobs
by billythemaniam 1214 days ago
Flan-T5 is much smaller than GPT-3, but was trained on significantly more data resulting in competitive accuracy. It is also Apache licensed. I wonder if that model is fast enough for enough use cases to make it cost effective?
3 comments

You can give the Xl (3B parameters) model a try here (would recommend a Colab Pro account): https://colab.research.google.com/drive/1Hl0xxODGWNJgcbvSDsD...

In my Colab Pro it's running this on a A100 (which is a very beefy GPU) and inference is very fast and definitely suitable for interactive use. On a T5 GPU (which is much cheaper) inference is still alright and probably ok for interactive use.

I think Flan-T5 is fast enough, but I don't think it generates text or abstract reasoning at nearly the same level as current GPT-3 models. This indicates a deficiency in the benchmarks and metrics that we use to evaluate LLMs. For generating embeddings it might work well enough though.
It's certainly not quite as good out of the box, at least the open sourced checkpoints. However so far I found it can achieve similar accuracy with enough examples and/or fine-tuning for my use cases. Like everything, it depends on what are doing too.
For embeddings, it may be overkill. Smaller BERT-type models can provide good embeddings when fine tuned with a contrastive learning objective. Eg: https://sbert.net.
Fine-tuning on smaller models like GPT-J (also trained on The Pile) worked well for Toolformer:

https://arxiv.org/abs/2302.04761