| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by beckhamc 881 days ago
	The issue is the obsession with benchmark datasets and their flaky evaluation

1 comments

What else could you do to test it besides it works for me and this test said it's good at talking?