Hacker News new | ask | show | jobs
by WhiteOwlEd 892 days ago
If you are using no-code solutions, increasing an "idea" in a dataset will make that idea more likely to appear.

If you are fine-tuning your own LLM, there are other ways to get your idea to appear. In the literature this is sometimes called RLHF or preference optimization, and here are a few approaches:

Direct Preference Optimization

This uses Elo-scores to learn pairwise preferences. Elo is used in chess and basketball to rank individuals who compete in pairs.

@argilla_io on X.com has been doing some work in evaluating DPO.

Here is a decent thread on this: https://x.com/argilla_io/status/1745057571696693689?s=20

Identity Preference Optimization

IPO is research from Google DeepMind. It removes the reliance of Elo scores to address overfitting issues in DPO.

Paper: https://x.com/kylemarieb/status/1728281581306233036?s=20

Kahneman-Tversky Optimization

KTO is an approach that uses mono preference data. For example, it asks if a response is "good or not." This is helpful for a lot of real word situations (e.g. "Is the restaurant well liked?").

Here is a brief discussion on it:

https://x.com/ralphbrooks/status/1744840033872330938?s=20

Here is more on KTO:

* Paper: https://github.com/ContextualAI/HALOs/blob/main/assets/repor...

* Code: https://github.com/ContextualAI/HALOs