Hacker News new | ask | show | jobs
by Detry322 2570 days ago
Paper author here: one cool thing about our technique is that our bot doesn't keep "notes" :)

The only thing it maintains for the entire game is this length-60 belief vector - a summary of who it thinks is evil and good. How people act influences this belief vector, but it can't look back at the game history. This leads to awkward play sometimes - it will propose missions that have failed in the past, etc. I think it's cool that we (humans) can summarize the state of the game with such little information, and that the bot does something similar :)

1 comments

Interesting, but surely it has to retain some information about the past to be able to know how to update that belief vector? Outside of directly being on a mission, the only major "good" or "evil" actions are voting for/against missions, once you know if the mission succeeded or failed. Do you just not take that into account?

It's possible this is in the paper - a lot of the more math/modeling parts went a bit over my head, so feel free to point me to a section to specifically read if I missed out.

If it's literally just a representation of the outcomes of the missions and who went on them, then isn't the Belief Vector just the venn diagram of how every mission went with some iterative statistics laid over it? I would have assume any regular/competitive players would be fairly good at keeping that mental model themselves, which makes it seem confusing to me that the Agent would be better than that, unless it's essentially just saying that the game is better if you play purely logically and ignore all context, which defeats the fun of playing it?

> Interesting, but surely it has to retain some information about the past to be able to know how to update that belief vector?

The belief vector is updated on the fly. When players take moves, we use the belief vector and our CFR-generated move probabilities to perform Bayes' rule. Once the belief vector is updated, we throw out all information related to the specific move they took.

> Outside of directly being on a mission, the only major "good" or "evil" actions are voting for/against missions, once you know if the mission succeeded or failed. Do you just not take that into account?

DeepRole takes all player actions into account - the key to good performance in Avalon is knowing how to interpret the voting/proposal actions of all the players. We explored this in our paper: LogicBot only uses the mission fail results to deduce who is good, and has a lower win rate than DeepRole in all situations.

> If it's literally just a representation of the outcomes of the missions and who went on them, then isn't the Belief Vector just the venn diagram of how every mission went with some iterative statistics laid over it?

While you can tease out the "venn diagram" aspect out of the belief vector (it will assign 0 probability to impossible assignments), it's far richer than that - it weights the possible assignments based on all of the moves it has observed.

In some sense, DeepRole is playing with one of its hands tied behind its back. All it knows about the state of the game is this belief vector, the number of succeeds, the number of fails, and the proposal count. It doesn't know the specific moves that led that point in the game. The fact we can summarize everyone's previous moves into this belief vector is somewhat surprising, considering human players can look back at the game history and re-synthesize for new insights.

I think you are exactly right. The algo is simply outmemorizing it's human counterparts. This isn't a very good paper at all.
I don't think this is true. On ProAvalon, human players can see the full history of the game at all times [1], and use it to make decisions. DeepRole, on the other hand, can only use its internal belief state. This belief state is only a summary of what has happened in the game - DeepRole has no way of knowing who went on previous missions, or how people voted, or who proposed what. See above for more detail.

[1] See this video for an example: https://www.youtube.com/watch?v=LKdY4Us0Ci4