Hacker News new | ask | show | jobs
by gradys 3238 days ago
I think a big component is not really machine learning but more related to how to represent state at any given time, which will necessarily involve a lot of human-tweaking of distilling down what really are the important things that influence winning.

I agreed with everything you said until here. Developing good representations of state is precisely what today's machine learning is so good at. This is the key contribution of deep learning.

You seem to be supposing that a human expert is going to be carefully designing a set of variables to track, and in doing so conveying what features of the input to pay attention to and what can be ignored. Presumably the ML can then handle figuring out the optimal action to take in response to those variables.

I think it's much more likely to be the other way around. ML is really good at taking high dimensional input with lots of noise and figuring out to map that to meaningful (to it, if not to us) high-level variables. In other words, modern AI is good at perception.

What it's significantly less good at compared to humans is what might formally be called the policy problem. Given high level variables that describe the situation, what's the best course of action? This involves planning. We think of it in terms of breaking the problem into sub-objectives, considering possible courses of action, decomposing a high level plan into a sequence of directly executable actions, etc. AIs might "think" of this problem in different terms than these, but it seems like it still has to do this kind of work if it is going to have a chance to succeed.

We don't have obvious ways to model this part of the problem. For the perception/representation building problem, I can almost guarantee the solution is going to be a ConvNet to process individual frames combined with a recurrent layer to track state over time. On the other hand, I'm seeing some plausible solutions to the policy problem emerging in the literature, but it's still very much an open question what will emerge as the go-to. In AlphaGo, this part of the problem is where they brought in non-ML algorithmic solutions like Monte Carlo tree search, and one of the reasons StarCraft is interesting compared to Go is that those algorithmic solutions are harder to apply.

3 comments

i feel like you misunderstood that part of the argument.

he is saying representing the state is very hard, and you are saying: given a well represented state, ML is very good at finding the important features, reducing the dementionality, and finding mathematical transformations, etc.

deep learning has been so successful with images because representing them is trivial - flattened pixel vector.

with your last paragraph is that in starcraft, that raises some questions on what rules is the AI going to adhere to.

in SC, you don't view the entire board. you view the minimap / hear noises and alerts and decide were to focus your attention on the map. in battle, being able to click and accurately place attacks quickly is important.

Do you give the computer full view of what they would be able to see? does the computer have 10 million clicks per second abilities, essentially every action is like hitting pause and then making the next action?

I was actually assuming the input representation would just be a video stream, which (combined with audio) is enough for human players, but looking more into it, it's a lot more than a video feed[1].

It feels a little like cheating, but I guess processing the game UI video feed isn't the interesting part of the problem. Plus, it makes the problem much more accessible to hobbyists who can't afford the GPU cluster required to productively experiment on models that process streams of 1080p video.

Still, in principle, I think modern ML modeling approaches could handle the problem of transforming the video feed into a useful high level state representation. I don't think I misunderstood the OP in that regard at least.

[1] - https://github.com/deepmind/pysc2/blob/master/docs/environme...

Using just the video feed, the AI would be required to reconstruct an overview of the strategic situation, and then develop a forward strategy on top of that involving individual units. Even for a much simpler game like doom, video-only input is enough for strategies like "see an enemy, target and shoot it as fast as possible".

For an AI to be able to effectively compete in a complex game like SC2, preparing high-level inputs is important. Look at these like shortcuts, heuristic approximations of task that would be hard to represent and train with deep learning. I would guess an implementation would need multiple independent nets for various tasks, combined with heuristics. Then each could be separately trained to do the given task.

People should just read the article, I think. It answers all the things you are debating (limit on APM, what features are used, what models they already tried and how well they perform).
>a ConvNet to process individual frames combined with a recurrent layer to track state over time. > are harder to apply Thats an understatement: Starcraft is immune to Monte-Carlo approach or anything based on analyzing pixel data: The tree state of actual battle has thousands of choices pet unit per second with minor variations in location, there is no discrete state of chessboard(at best millions of cells): viewing the game at low-level(pixels) creates gigantic amount of data. units constantly move/attack/die and get blocked by other units/terrain.

Predicting an enemy move(MC simulation) will be impossible and you can make several moves per second(even at 120-140 APM) easily. That means 1.you need real-time response, unlike Go there isn't a time buffer to decide 2.you always need to react at the current time(or allowing enemy advances) 3.there are very few "good moves" in starcraft(moving randomly on the "board" will just waste time) , so MC simulation will miss them more than 99% of time due randomness.

MC approach is vastly inferior in this case, i think they'll be forced to operate on higher level strategy rather than just microing every unit optimally(i.e. treating it like chess in real-time). Brute-forcing billions of potential moves simply won't work.

>Brute-forcing billions of potential moves simply won't work.

The problem is all AI/ML is essentially recorded, recursive, constrained brute forcing.

You can apply it on higher level like that guy who bruteforced the 7roach rush for Zerg in SC2. http://lbrandy.com/blog/2010/11/using-genetic-algorithms-to-... Problem is that build orders are just optimizing the opening economy and these unbalanced openings will be just patched out in the future.
I'd argue modern AI is sort of terrible at taking high dimensional data and finding an effective representation of it. It works better than a lot of other methods in ML but as far as I know pure reinforcement learning applications are sort of lack luster, and even dimensionality reduction success stories tend to rely on scrubbed, careful data treatment by people.