Hacker News new | ask | show | jobs
by TekMol 2933 days ago

    How difficult would it be for a bot
    to simulate a human's mouse movements?
Very very hard. Because the bot author does not have the giant database that FB has to analyze how humans move the mouse around. Also the bot author does not know which aspects FB looks at to determine if it's a human.

And even if the bot author had all that information, it would still be super hard to write an AI that accomplishes a given task in a way that mimicks a human successfully. It would mean to win a 'mouse turing test'.

    Shouldn't there be an API for that?
What the API returns is under the control of the user. So the API does not help FB to fingerprint you.

This issue touches on the real privacy problem the net is facing. It's not the wrong cookies or privacy policies. It's fingerprinting. There is no technical solution to it.

5 comments

> Very very hard. Because the bot author does not have the giant database that FB has to analyze how humans move the mouse around.

Don't forget that Facebook's false positive rate should be very low. There are lots of humans on their platform, and they should all pass the test.

This makes it easier to construct a bot that will pass the test.

It won't hurt if humans fail the test every so often, as long as it's under a threshold that humans regularly can overcome.

I can imagine it would be easy to trick the system a few times (either as a bot pretending to be human, or a human acting like a bot), but tricking it consistently over months or years is going to be damn near impossible.

Also don't forget that Facebook probably has to do all detection in Javascript on the client, i.e. with limited resources. I suspect they don't send every mouse-movement to the server. This also means they probably don't have fine-grained historical data.
Not necessarily.

I've only given it a few minutes thought, but position and time data is really small, and easy to compress (you don't need to send anything while the user isn't moving the mouse). If it's sent in batches or over an already open websocket, it's not like it's using a ton of resources on the client.

Assuming all of their users (guessing a billion daily active users) are on desktop half of the time (a wildly incorrect assumption I'm sure), and the mouse position data is 1mb per person for the data you care about (which again, seems like a lot), that's 500tb.

For $25k you could store it all. That's nothing compared to the benefits of being able to identify bots on your platform.

Yes, the standard way to do this a few years back for conversion optimization, was to RLE compress and send the data in intervals. Also the resolution/measurement does not need to be in the milliseconds.
You could probably start with recording all of your mouse movements over a period of a month or so. Record speed, acceleration, how straight each move is, how much each move deviates from a straight line, how many times the mouse movement stops along its way to its target, where you place the mouse when you scroll, etc.

Using these metrics you could probably start to draw some characteristics of how your mouse acts based on what you are doing and where you are moving your mouse.

This could then probably be used to build some form of algorithm that moves the mouse for you with noise (accelerating up & down along the way, deviation from a straight line, stopping in the middle of the line, etc.).

I was thinking more of a machine-learning approach, e.g. using a GAN network.
Many ways to skin a cat. Not an impossible thing to solve in my book. But i dont deny that it would probably be difficult.
I was thinking the same. Maybe we should do it together...
All those "the attacker won't have the knowledge" underestimate that the attacker can simply run their own websites tracking the exact same stuff, and can then just get the same knowledge.

You need to break ReCaptcha? Simple, you implement your own captcha on your own site that's frequently used and whenever you need to solve one you copy the challenge and present it to one of your users.

Same with recording mouse data.

It's an old idea even, very similar to https://xkcd.com/792/

> Very very hard. Because the bot author does not have the giant database that FB has

You don't need to learn from all humans. You need to learn from very few (or just even one).

Not all problems are machine learning problem.

Not only that, this can be made a machine learning problem if needed. I'm a human, so if I train my computer to act like my mouse movement, it's sufficient to fool facebook. Well, now I realize that this is not as easy as it sounds since as other pointed out we don't know how fine grained Facebook's data is and what they're paying attention to. I'm just saying that theoretically I should be able to train my agent to act just like me.
I don't think it should be that difficult. Project for a weekend hackathon or so. Collect same mouse movement data with a js on your own website if you own anything mildly popular or partner with someone and buy it off, stick it into some off the shelf GAN, job's done. Turing test is broken by modern deep learning

At least the mouse movements themselves shouldn't be difficult to do given a source of data. Simulating that you click on same FB UI elements as real people with same statistical properties on other hand is where you might be lacking the data to do it properly

> stick it into some off the shelf GAN, job's done. Turing test is broken by modern deep learning

This is a cartoon version of deep learning.