Hacker News new | ask | show | jobs
by the_tli 1055 days ago
Anyone aware of a practical how to on implementing a data flywheel for fine-tuning (improving the model with user feedback)?
1 comments

Not seen a great explainer on this yet.

You'd either need access to the model weights or a fine-tuning API.

Then depending on which fine-tuning approach you want to use, the user data you need to collect will be different: RLHF requires multiple outputs to a single query vs instruction fine-tuning where you need great input-output pairs to train on. You could ask the user's feedback after running the LLM to pick out good training data.