Hacker News new | ask | show | jobs
by CobrastanJorji 1074 days ago
Y'know, the thing I least like about these AI video game players is how unlike humans they look. I was wondering about the difference, and I think it comes down to two parts. First and foremost, human players generally prefer routes with a lot of tolerance for input error. Second, humans take frequently "mental planning breaks," stopping for a moment in safe spots before challenging areas.

I think you could juggle the heuristics to demonstrate the preference for input error. For ML training, you could just random vary input timing by up to 20ms or so to teach the algorithm to favor safer moves. For path finding, it's trickier, but there's probably a way to favor "wide" paths. I'm less sure how to express the second concept, pausing briefly in "safe areas," but I imagine it's maybe noticing a place where significant amounts of entering no inputs does not affect the results.

8 comments

What you’re basically describing is bounded rationality, which has been widely studied in behavioral economics, psychology, and engineering applications (Simon and Gigerenzer are two big names to google). A common framework for formalizing it is as what boils down versions of rate-distortion problems from information theory (very related to Bayesian statistics).

The reason it’s of engineering interest is, like you observe, bounded-rationality gives you solutions that are sub-optimal but more robust and often simpler.

Moreover, finding wide path solutions emerges naturally from sampling-based motion planners. These planners are asymptotically optimal, but if you terminate them early, they are more likely to give you a solution that goes through large gaps, not smaller ones, because it’s unlikely to sample a trajectory that goes through a tight space without heavy sampling. You could probably formulate that in the rate-distortion framework but I haven’t thought about how to do it precisely.

Oh cool! It's always hugely useful to learn the word for the thing you're thinking about. It can be really tricky to figure out if a vague idea has a name unless you're already pretty well read in a field. Now I have some reading to do, thanks!
This is actually a big issue with academic research related to bounded rationality. Although you could model it mathematically in another way, by far the most common is to use the rate-distortion approach. Rate distortion theory basically boils down to analyzing optimization problems of the form “minimize cost + (information-theoretic) entropy”. Problems of that form arise and are used for different reasons in fields including, e.g.: statistical mechanics, Bayesian statistics, anything in machine learning using softmax, large deviation theory, differential privacy, and, of course, bounded rationality and information theory.

However, since all these fields refer the same thing by different names, tools for handling problems in one field don’t get picked up by people working in another field. Either someone else rediscovers it later or someone has to have knowledge of multiple fields and see a connection. Sometimes the analysis done by one field isn’t useful in another due to different assumptions and research concerns, but that’s not obvious because you have to peel back a lot of layers of domain-specific jargon when reading the paper. Even though the math is very similar, reading a statistical mechanics paper written by a physicist is a real pain if you’re coming from an applied math / CS background, for example, because fields have their own notational conventions and refer to application scenarios that are meaningless to you and you need to figure out if that thing they reference is important to their development or not in the abstract.

It’s almost like reading House of Leaves. Here’s 30 pages with weird fonts describing the use of light in a non-existent movie and comparing it to both real movies and fake movies real people were supposedly involved in. Will it be relevant to the plot and thus require careful reading or can I skim this section? Maybe, but you won’t know unless you keep reading.

This is where ChatGPT usually shines. I gave it your comment and asked it to find the concept. I had to nudge it in the right direction though: https://chat.openai.com/share/f8855a35-7076-43e6-bf9e-19de8b...
> game players is how unlike humans they look.

I think you should watch some speed runners. They also don't look human, since they have some form of optimization in mind, compared to a casual player.

But, you will see that in most games RTA (Real Time Attack, speed running by humans, live) does choose different strategies than TAS (Tool Assisted Speedrun, still humans, but using tools to record and splice together a sequence of inputs for the game).

For a TAS you can justify taking fifty one-in-ten chances in a row, because every time it doesn't come off you just throw that away and re-record, so maybe you do a few hundred re-records for that section, not bad at all. In RTA that's never going to make any sense, it kills essentially 100% of runs.

> In RTA that's never going to make any sense, it kills essentially 100% of runs.

it depends on how much you want that world record.

It really doesn't. 10 to the 50 is an unimaginably huge number. If you could take this chance, once per second, for your whole lifetime, you've essentially no meaningful chance to succeed - you should do something else.
The tolerance for error is a huge thing! I also see (or intuit) people making tradeoffs between fault-tolerance and speed, ex. taking a tight curve on a regular block and a wide one on a piranhna plant.

That's more interesting and reassuring to watch. I think it's because the player's mind comes across in the playstyle. It's almost as if their entire history with the game is revealed.

I guess what's tangentially related is the tendency of AlphaGo (and friends) to favor an extremely likely 1 point win over a still likely (but not as much) 10 point win, making the endgame seem very non-human like. They still absolutely dominate human players but not with the point difference you'd expect a much stronger player to have.
I think the second issue would be pretty hard to solve without doing something artificial like slowing down the computer a ton, and then trying to come up with an algorithm to, like, delay for processing time.

It is just the nature of computers that they do simple things very fast. Humans do complex things very slowly (but can actually do them). This is why we are friends, we complement each other.

Although I do wonder, if the paths were not made to such tight tolerances (using your input delay solution), maybe AI Mario would spend a little longer lingering in areas just to let things align nicely for less-tight jumps.

It's really not a time issue. The case is if the AI can pause mid game in a "safe area" to deduce an optimal path not just greedy/lookback
For resting, you can build on your first idea: the random error increases with uninterrupted play time, and resting quickly lowers it.
I don't think making humans comfortable is the goal with respect to AI. The goal is to actually solve a problem. Performance is second. Human comfort is a distant third or beyond.

When AI can reliably solve a problem without significant negative consequenses from time to time, it's a win. How humans feel about the method is effectively irrelevant.

The near miss behavior is very much like overfitting. Mario is simple and deterministic enough that it doesn’t matter, but think about a scenario like a self driving car. A calculated near miss turns into a crash if the other car’s driver is just a little slower or their tires a little slicker than anticipated.
> I don't think making humans comfortable is the goal with respect to AI

According to whom?

AI in games, has historically been all about human comfort/enjoyment. Extremely good AI that seems "unnatural" to humans is usually not the goal.

You seem to talk about AI-controlled-NPCs in games, while GP starts from the article context about AI-controlled-PCs (player characters) and proceeds to generalize about using AI to solve problems outside games.
I think the point is that different people have different goals for "AI"
I just couldn't bear all that walking and jumping into near misses in the video.
The AI doesn't care that it's one pixel away from death.

That said, I could see some highly-skilled players (like those who do speedruns) showing off their precision and adopting a similar "scare the audience" style for a new genre of competition.

Also just chilling with the Bullet Bill a few pixels away for half the run!