Hacker News new | ask | show | jobs
by williamcotton 1211 days ago
Fine-tuning on smaller models like GPT-J (also trained on The Pile) worked well for Toolformer:

https://arxiv.org/abs/2302.04761