|
|
|
|
|
by madiator
512 days ago
|
|
Synthetic data got a bad reputation last year, but it is now an important component for all modern LLMs! In fact, we had also trained one model for -- ironically -- detecting hallucinations, and it was also trained on synthetic data. Say if you have some PDFs, and want to generate questions and answers to test your RAG pipeline, that's synthetic data! Distillation is mostly synthetic data and works great as well! Our hope is that this becomes steroids rather than LSD for LLMs :) |
|