Hacker News new | ask | show | jobs
by henry_pulver 1055 days ago
Not seen a great explainer on this yet.

You'd either need access to the model weights or a fine-tuning API.

Then depending on which fine-tuning approach you want to use, the user data you need to collect will be different: RLHF requires multiple outputs to a single query vs instruction fine-tuning where you need great input-output pairs to train on. You could ask the user's feedback after running the LLM to pick out good training data.