| HN Mirror

> If you specifically mean a general LLM trained on a general language corpus with instruction finetuning this is correct.

Yes, thanks, that's what I meant.

> If you are training a LLM on a domain specific corpus or finetuning on specific downstream tasks even relatively tiny models at 330m params are definitely useful and not “toys” and can be used to accurately perform tasks such as semantic text search, document summarization and named entity recognition.

Agree, BERT family is a good example here.