Y
Hacker News
new
|
ask
|
show
|
jobs
by
beckhamc
881 days ago
The issue is the obsession with benchmark datasets and their flaky evaluation
1 comments
graphe
881 days ago
What else could you do to test it besides it works for me and this test said it's good at talking?
link