| > Such a service is the opposite of a flywheel (a brake?) in practice. Those tokens are extremely low quality data. I think not. Say you ask the model to help solve a coding problem. It gives you an idea, you try, it fails, come back and iterate. They can save a note for later finetuning - what worked and what didn't work, using you the user as a validation system for the LLM. But you might also have your own experience and help the model where it struggles, and finally achieve the task. That is how the model can borrow both your experience and your manual validation work to improve itself. Some tasks are spread over multiple sessions, or multiple days. They can cluster and look at your progress over time. The latter steps provide rich feedback on the quality of the former steps. Hindsight is 20/20. Even in chats where the user doesn't perform validation there is rich feedback, people share some of their tacit experience. It's a form of delayed feedback, humans act as caches of unique experience. The way I conceptualize this is as a search process - problem space search. LLMs can search better with assistance, and humans also search better with assistance. LLMs collect experience from millions of people, they funnel experience into their logs. |