| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tacoooooooo 239 days ago
	Standard benchmarks (like BEIR/MS MARCO) are great, but they are likely already in distribution for foundation models training sets, and crucially, they lack the complex, structured metadata needed to test real-world filtering scenarios (e.g., "Find docs from region X, between dates Y and Z, with tag A"). datasetFactory is an orchestrated LLM pipeline that turns a single natural language prompt into a (potentially) massive, structured evaluation dataset.