| If you are using no-code solutions, increasing an "idea" in a dataset will make that idea more likely to appear. If you are fine-tuning your own LLM, there are other ways to get your idea to appear. In the literature this is sometimes called RLHF or preference optimization, and here are a few approaches: Direct Preference Optimization This uses Elo-scores to learn pairwise preferences. Elo is used in chess and basketball to rank individuals who compete in pairs. @argilla_io on X.com has been doing some work in evaluating DPO. Here is a decent thread on this: https://x.com/argilla_io/status/1745057571696693689?s=20 Identity Preference Optimization IPO is research from Google DeepMind. It removes the reliance of Elo scores to address overfitting issues in DPO. Paper: https://x.com/kylemarieb/status/1728281581306233036?s=20 Kahneman-Tversky Optimization KTO is an approach that uses mono preference data. For example, it asks if a response is "good or not." This is helpful for a lot of real word situations (e.g. "Is the restaurant well liked?"). Here is a brief discussion on it: https://x.com/ralphbrooks/status/1744840033872330938?s=20 Here is more on KTO: * Paper: https://github.com/ContextualAI/HALOs/blob/main/assets/repor... * Code: https://github.com/ContextualAI/HALOs |