Hacker News new | ask | show | jobs
by mmcwilliams 842 days ago
This is anecdata but "good enough" is relative. I've finetuned TinyLlama with the same dataset and technique as Llama2 7B for on-device purposes (not for cost or privacy but for physical hardware that have to run offline and with low power consumption) and it produces higher task alignment in 1/4 the inference time. As a general purpose model it isn't great but small models have their place in the ecosystem.
1 comments

Care to elaborate on the finetune? It's surprisingly very hard to come across a useful finetuning examples.
Sure, very generally we're doing PEFT starting with insights from examples very much like this one [0] and have gradually built our own tooling and customized the approach a lot as the underlying Huggingface libraries have progressed even in the last 6 months.

I will say that one of the most important parts of the process that I've found is in the prompt structuring, the use of special tokens based on how the base models were trained and customizing the tokenizer where necessary. That work in particular is not covered adequately by the examples I was able to find when I started, in my opinion.

[0] https://medium.com/@kshitiz.sahay26/fine-tuning-llama-2-for-...