Hacker News new | ask | show | jobs
by ai_what 745 days ago
It won't. Amazon kind of when that angle with MistralLite[1] (a 7B finetune), and it was barely passing in terms of being an effective summarizer. 0.5B are pretty much useless.

https://huggingface.co/amazon/MistralLite

2 comments

The official Mistral-7B-v0.2 model added support for 32k context, and I think it's far better than MistralLite. Third-party finetunes are rarely amazing at the best of times.

Now, we have Mistral-7B-v0.3, which is supposedly an even better model:

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

My experience is that < 500M models are pretty useful when fine-tuned on traditional NLP tasks, such as text classification and sentence/token level labeling. A modern LM with a 32K context window size could be a nice replacement for BERT, RoBERTa, BART.