|
|
|
|
|
by kaushik92
719 days ago
|
|
We identified and solved for 2 key problems with generating data using GPT:
1. Duplicate/similar data points - we solve this by adding deduplication to our pipeline.
2. Incorrect question-answers - we check for correctness and context relevance. Filter out incorrect rows of data. Apart from this, we generate a diverse set of questions including complex reasoning and chain of thought. We also generate domain specific unsafe questions - questions that violate TnC of the particular LLM to test the model guardrails. |
|