|
|
|
|
|
by viraptor
537 days ago
|
|
This is a good read for some examples https://arxiv.org/abs/2203.14465 > This technique, the "Self-Taught Reasoner" (STaR), relies on a simple loop: generate rationales to answer many questions, prompted with a few rationale examples; if the generated answers are wrong, try again to generate a rationale given the correct answer; fine-tune on all the rationales that ultimately yielded correct answers; repeat. We show that STaR significantly improves performance on multiple datasets compared to a model fine-tuned to directly predict final answers But there are a few others. In general good data is good data. We're definitely learning more about how to produce good synthetic version. |
|
Data smuggling is a known phenomenon in similar tasks.