| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by atemerev 432 days ago
	10 million records is a toy dataset. Usually, you can fit it in memory on a laptop. There are open large(-ish) text datasets like full Wikipedia or pre-2022 Reddit comments, that would work much better for benchmarking.