Hacker News new | ask | show | jobs
by YeGoblynQueenne 2659 days ago
Just to add a bit to this:

>> Using data from drafts carried out by humans, I trained a neural network to predict what card the humans would take out of each pack. It reached 60% accuracy at this task. And in the 40% when the human and the model differ, the model’s predicted choice is often better than what the human picked.

How the model's pick is "better than what the humn picked" is never made clear, but since accuracy is measured by the model's ability to match its training set, I assume that's also what is meant by "better": the model was better than a human in memorising and reproducing the decks it saw during training.

Well, you'd never evaluate a human's deckbuilding skills by how well they can reproduce a deck they've seen before. Given the same deck archetype, 10 humans will probably make 10 different card choices, for reasons of their own. It's like trying to evaluate how people style their hair by measuring how similar their hair looks to some examples of particular hair styles. It's a concrete measure, but it's also entirely meaningless.

This effort really suffers in terms of evaluation, and so we have learned nothing about how good the model is, which is a shame.

2 comments

The author was saying that, in their personal playing skill-limited estimation, the model made a stronger pick than the human drafter did. They propose that individuals overrate and underrate cards, but the model collects that and appropriate rates them. But it could also just be that the author is overrating cards - that’s why they asked for other opinions.
Yes, but that's an arbitrary and subjective qualification. It's eyballing - useful as a tool to evaluate your model during development, maybe, but not what you report when you claim the finished model is "better than humans".
Reproducing a deck I've seen before exactly makes no sense in limited.

Maybe it's too different of an idea, but in I draft I absolutely evaluate (part of) my skill by how well I've reproduced important components of a good deck. Did I find my seat? Good curve? Enough removal? Then there are format specific things - did I include enough 1/3s for 2, knowing that I'm likely to lose to fast decks with 2/1s if I don't?

>> Reproducing a deck I've seen before exactly makes no sense in limited.

Hi. From your write-up and a quick look at your notebook that's what your model is doing. And you measure its accuracy as its ability to do so. Is that incorrect?