| HN Mirror

Not seen a great explainer on this yet.

You'd either need access to the model weights or a fine-tuning API.

Then depending on which fine-tuning approach you want to use, the user data you need to collect will be different: RLHF requires multiple outputs to a single query vs instruction fine-tuning where you need great input-output pairs to train on. You could ask the user's feedback after running the LLM to pick out good training data.