|
|
|
|
|
by tacoooooooo
191 days ago
|
|
Standard benchmarks (like BEIR/MS MARCO) are great, but they are likely already in distribution for foundation models training sets, and crucially, they lack the complex, structured metadata needed to test real-world filtering scenarios (e.g., "Find docs from region X, between dates Y and Z, with tag A"). datasetFactory is an orchestrated LLM pipeline that turns a single natural language prompt into a (potentially) massive, structured evaluation dataset. |
|