Hacker News new | ask | show | jobs
by gardnr 811 days ago
You probably want to build a retrial augmented generation pipeline.

If you do end up wanting to fine tune then use qlora with axolotl or unsloth to prove your hypothesis on a smaller model and then evaluate if you want the marginal gains you get from full precision training.

After you fine tune it with 100m token dataset, use DPO to polish it off. You need to create a DPO dataset for that but it can be relatively small to get some great gains.

After that, look at applying grammars during inference if you are expecting structured results like json.

You should be able to run the experiments on 4090s from vast.ai or runpod or similar service.

It can cost less than $100 depending on your requirements.

2 comments

This is great advice!

I'd like to add that if you don't have pairwise preference data (A > B) but do have binary data (A is good for x_1, B is good for x_2, etc.), then Kahneman-Tversky Optimization (KTO) might be a better fit. Despite learning with a weaker signal, it works as well or better than dpo in practice.

Do you have any tutorials do achieve all this? Thanks.