Hacker News new | ask | show | jobs
by YeGoblynQueenne 2657 days ago
Sorry but this is extremely shoddy work. M:tG will never be "solved" like this.

>> Using data from drafts carried out by humans, I trained a neural network to predict what card the humans would take out of each pack. It reached 60% accuracy at this task.

Going by what's in the linked notebook, the model was evaluated on its ability to match the decks in its training set card-for-card.

Without any attempt to represent game semantics in the model, the fact that the deck sometimes "predicts" different picks than the actual picks in the dataset tells us nothing. It probably means the model has some variance that causes it to make "mistakes" in its attempt to exactly reproduce its dataset. It certainly doesn't say that the model can draft a good M:tG deck, certainly not in any set other than Guilds of Ravnica.

>> The model definitely understands the concept of color. In MTG there are 5 colors, and any given draft deck will likely only play cards from 2 or 3 of those colors. So if you’ve already taken a blue card, you should be more likely to take blue cards in future picks. We didn’t tell the model about this, and we also didn’t tell it which cards were which color. But it learned anyway, by observing which cards were often drafted in combination with each other.

This is a breathakingly brash misinterpretation of the evidence. The model's representation of a M:tG card is its index in the Guilds of Ravnica card set. It has no representation of any card characteristic, including colour. If it had learned to represent "the concept of colour" in M:tg in this way, it wouldn't be a neural net, it would be a magick spell.

The author suggests that the model "understands" colour because it drafts decks of specific colours. Well, its dataset consists of decks with cards of specific colours. It learned to reproduce those decks. It didn't learn anything about why those decks pick particular cards, or what particular cards are. All it has is a list of numbers that it has to learn to put together in specific ways.

This is as far from "understanding the concept of colour", or anything, as can be.

There are many more "holes" in the article's logic, that just go to show that you can train a neural net, but you can't do much with it unless you understand what you're doing.

Apologies to the author for the harsh critique, if he's reading this.

3 comments

Just to add a bit to this:

>> Using data from drafts carried out by humans, I trained a neural network to predict what card the humans would take out of each pack. It reached 60% accuracy at this task. And in the 40% when the human and the model differ, the model’s predicted choice is often better than what the human picked.

How the model's pick is "better than what the humn picked" is never made clear, but since accuracy is measured by the model's ability to match its training set, I assume that's also what is meant by "better": the model was better than a human in memorising and reproducing the decks it saw during training.

Well, you'd never evaluate a human's deckbuilding skills by how well they can reproduce a deck they've seen before. Given the same deck archetype, 10 humans will probably make 10 different card choices, for reasons of their own. It's like trying to evaluate how people style their hair by measuring how similar their hair looks to some examples of particular hair styles. It's a concrete measure, but it's also entirely meaningless.

This effort really suffers in terms of evaluation, and so we have learned nothing about how good the model is, which is a shame.

The author was saying that, in their personal playing skill-limited estimation, the model made a stronger pick than the human drafter did. They propose that individuals overrate and underrate cards, but the model collects that and appropriate rates them. But it could also just be that the author is overrating cards - that’s why they asked for other opinions.
Yes, but that's an arbitrary and subjective qualification. It's eyballing - useful as a tool to evaluate your model during development, maybe, but not what you report when you claim the finished model is "better than humans".
Reproducing a deck I've seen before exactly makes no sense in limited.

Maybe it's too different of an idea, but in I draft I absolutely evaluate (part of) my skill by how well I've reproduced important components of a good deck. Did I find my seat? Good curve? Enough removal? Then there are format specific things - did I include enough 1/3s for 2, knowing that I'm likely to lose to fast decks with 2/1s if I don't?

>> Reproducing a deck I've seen before exactly makes no sense in limited.

Hi. From your write-up and a quick look at your notebook that's what your model is doing. And you measure its accuracy as its ability to do so. Is that incorrect?

> The author suggests that the model "understands" colour because it drafts decks of specific colours. Well, its dataset consists of decks with cards of specific colours. It learned to reproduce those decks. It didn't learn anything about why those decks pick particular cards, or what particular cards are. All it has is a list of numbers that it has to learn to put together in specific ways.

> This is as far from "understanding the concept of colour", or anything, as can be.

It is very arguably bad feature engineering - if you have the information readily available, don't make the network infer it - but I think the description is fair.

Word2vec uses a similar model. It starts out knowing nothing about each word except an arbitrary numeric index, and learns everything else by predicting words that appear next to each other. By the end of the training it clearly has internal representations of concepts like "color", "verb", "gender", etc.

The same concept should apply here - by observing what cards are used in similar decks, with enough training data it should eventually associate concepts like card type, color and mana costs to each card.

In this case there isn't enough training data for that kind of resolution, but it has learned that blue cards go with blue cards, and red cards with red cards, and there's no hard lines from there to the concept of color.

Sure this isn't going to "solve" MtG, and I don't think it is a particularly good approach for the problem statement, but I think the idea is workable, and the network could already contain a proto-concept of "color" that would be refined with more training.

>> In this case there isn't enough training data for that kind of resolution, but it has learned that blue cards go with blue cards, and red cards with red cards, and there's no hard lines from there to the concept of color.

A card is blue (resp. red, etc) because it has a blue mana symbol in its casting cost. Not because it is found in the company of other blue cards. That is the concept of colour that a model must represent before you can say with any conviction that it "understands" the concept of colour. In terms of "hard lines"- that's the hard line you must cross.

The kind of model you're talking about then would be a classifier able to label individual cards with their colours, or an end-to-end model with an internal representation of cards' charactersitics. That is not what was shown here.

By observing what cards people tend to pick together, you can infer that certain cards have certain properties, even if you never get see the card face.

A blue card is found in the company of other blue cards, because humans picked them, because of the blue mana symbol in its casting cost.

With proper training, you end up with exactly the "end-to-end model with an internal representation of cards' charactersitics"

Since it can't see the cards, it can't say anything useful about a card it hasn't seen during training, but if you added some new cards and started training again, a pre-trained net might learn the new cards faster than one you train from scratch. That would be evidence that the network has learnt a meaningful embedding.

There is no proof that this network has done so, but I think word2vec shows that it's a feasible approach.

>> By observing what cards people tend to pick together, you can infer that certain cards have certain properties, even if you never get see the card face.

You're assuming way too much capability that is not present. Just because a human can make this inference, it doesn't mean that a neural net can. Neural networks are notoriously incapable of inference, or anything that requires reasoning.

>> There is no proof that this network has done so, but I think word2vec shows that it's a feasible approach.

Word2vec (word embeddings in general) are actually a good example why this kind of thing doesn't work the way you think it does. A word embedding model represents information about the context in which tokens (words, sentences, etc) are found but it does not, in and of itself, represent the meaning of words. The only reason why we know that words it places in the general vicinity of each other have similar meaning is because we already understand meaning and we can interpret the results. But the model itself does not have anything like "understanding". It only models collocations.

Same thing here. You seem pretty certain that with more data (perhaps with a deeper model) you can represent something that the model doesn't have an internal representation for. But just because the behaviour of the model partially matches the behaviour of a system that does have an internal representation for such a thing, in other words, a human, that doesn't mean that the model also behaves the way it behaves because it models the world in the same way that the human does.

And you can see that very clearly if you try to use a model like the one in the article, or one trained on all the magic drafts ever, to draft a set of cards it hasn't seen before. It should be obvious that such a model would be entirely incapable of doing so. That's because it doesn't represent anything about the characteristics of cards it hasn't seen and so can't handle new cards. A human understands what the cards' characteristics means and so can just pick up and play a new card with little trouble.

As to what I mean by "internal representation"; machine learning models that are trained end-to-end and that are claimed to learn constituent concepts in the process of learning a target concept actually have concrete representations of those constituent concepts as part of their structure. For example, CNNs have internal representations of each layer of features they learn in the process of classifying an image. Without such an internal reprsentation all you have is some observed behaviour and some vague claims about understanding this or learning that, at which point you can claim anything you like.

> A word embedding model represents information about the context in which tokens (words, sentences, etc) are found but it does not, in and of itself, represent the meaning of words. The only reason why we know that words it places in the general vicinity of each other have similar meaning is because we already understand meaning and we can interpret the results. But the model itself does not have anything like "understanding".

This is a mostly meaningless semantic distinction. I can ask you to give a synonym for "king" and you might suggest ruler, lord, or monarch. I can ask a word2vec model for a synonym for "king" and it will provide similar suggestions. What "understanding" of the words' meanings do you have that the model lacks? Be specific!

Definitions are abstract concepts, so the fact that you can pick similar words and so can the model are equivalent. To put it differently:

>The only reason why we know that words it places in the general vicinity of each other have similar meaning is because we already understand meaning and we can interpret the results.

Is not correct. The only reason why we know that the words it places in the general vicinity of each other have similar meanings is because our mental models put the same words in the same vicinities.

>Same thing here. You seem pretty certain that with more data (perhaps with a deeper model) you can represent something that the model doesn't have an internal representation for. But just because the behaviour of the model partially matches the behaviour of a system that does have an internal representation for such a thing, in other words, a human, that doesn't mean that the model also behaves the way it behaves because it models the world in the same way that the human does.

This doesn't matter. Just because the model's internal representation of a concept doesn't map obviously to the way you understand it doesn't mean that the model doesn't have a representation of that context. Word2vec models do represent concepts. We can interpolate along conceptual axes in word2vec spaces. That's as close to an internal representation of an isolated concept as you're gonna get. Like, I can ask a word2vec model how "male" or "female" a particular term is, and get a (meaningful!) answer. We never explicitly told the word2vec model to monitor gender, but it can still provide answers because that information is encoded.

>Without such an internal reprsentation all you have is some observed behaviour and some vague claims about understanding this or learning that, at which point you can claim anything you like.

Again, who cares? If it passes a relevant "turing test", what does your quibble about the internal representation not being meaningful enough to you matter? Clearly there's an internal representation that's powerful enough to be useful. Just because you can't understand it at first glance doesn't make it not real.

To address another one of your comments:

> Hi. From your write-up and a quick look at your notebook that's what your model is doing. And you measure its accuracy as its ability to do so. Is that incorrect?

neither I nor the person you responded to is the author. But yes, this understanding is incorrect. It is indeed trained on historic picks, but this is not the same thing as reproducing a deck that it has seen before. To illustrate, imagine that the training set of ~2000 datapoints had 1999 identical situations, and 1 unique one.

The unique one is "given options A and history A', pick card a". The other 1999 identical ones are "given options A and history B', pick b" (yes this is as intended). A model trained to exactly reproduce a deck it had seen previously would pick "a". The model in question would (likely, depending on the exact tunings and choices) pick "b".

This bias towards the mean is intentional, and is completely different than "trying to recreate an exact deck it's seen before", which isn't a thing you normally do outside of autoencoders and as others have mentioned, doesn't make much sense.

This is a mostly meaningless semantic distinction. I can ask you to give a synonym for "king" and you might suggest ruler, lord, or monarch. I can ask a word2vec model for a synonym for "king" and it will provide similar suggestions. What "understanding" of the words' meanings do you have that the model lacks? Be specific!

Why do you need to admonish me to be specific?

word2vec can only represent meaning by mapping words to other words. I have a human understanding of language that goes well beyond that. For example, I don't need to limit myself to synonyms of king- I can use circumlocution: "a king is the hereditary monarch leading a monarchist nation". word2vec can tell you which of those words are close to king, in its model, but it can't put together this simple sentence that describes their relation.

Not to mention I can generate and recognise who knows how many more representations of the concept "king" than word2vec can. I can draw you a cartoon of a king, or rather, an unlimited number of them, each different than the other. I can sing you a song about kings. I can write you a poem. I can dance you an interpretive dance about kings.

I don't know if you really think that word2vec is really as good as a human at representing meaning, but, just in case: it's not even close.

>> Again, who cares? If it passes a relevant "turing test", what does your quibble about the internal representation not being meaningful enough to you matter? Clearly there's an internal representation that's powerful enough to be useful. Just because you can't understand it at first glance doesn't make it not real.

What is that internal representation?

This is a lot of anger over a choice of vocabulary. The author certainly didn’t mean that the model had a deep understanding of the nuances of the concept of color - just that it had identified clusters that in real life correspond to the color of cards.
My wording is stronger than I intended it (I don't feel that strongly about Magic), but the author did claim that his model "definitely understands the concept of colour", that it is "better than humans" and made several other strong claims besides.