Hacker News new | ask | show | jobs
by PaulHoule 696 days ago
I do most of my work so far with BERT models but if I was trying to fine-tune a generative model I think I'd try a T5 model.

https://huggingface.co/docs/transformers/en/model_doc/t5

https://medium.com/nlplanet/a-full-guide-to-finetuning-t5-fo...

Specifically you can show a T5 model input and output texts and it will try to learn the transformation between them. People tell me T5 models are relatively easy to train and they perform well on many tasks.

Note another approach to your problem is RAG

https://www.promptingguide.ai/techniques/rag

If you have some specific documentation on your topic you could use the embedding to find some text that is relevant to the query. In fact this stacks great with the fine-tuning because you could train the model to, given a question and a relevant document, give an answer. T5 is good at that kind of basically summarization task.

1 comments

Thank you for explaining this. I’m looking at the links and the models to see what works best for me.
Check out llama-index, they have a ton of great content on RAG (Retrieval Augmented Generation), which is what you probably want to look at instead of training, much cheaper and new documents don't require more training