Hacker News new | ask | show | jobs
by amelius 2931 days ago
> Facebook said that it tracks mouse movements to help its algorithm distinguish between humans and bots.

Stupid cat and mouse game. How difficult would it be for a bot to simulate a human's mouse movements? I suppose not very difficult.

Also, doesn't this conflict with rare types of input devices? Or people with a motor function disability?

> to also determine if the window is foregrounded or backgrounded

Shouldn't there be an API for that?

8 comments

    How difficult would it be for a bot
    to simulate a human's mouse movements?
Very very hard. Because the bot author does not have the giant database that FB has to analyze how humans move the mouse around. Also the bot author does not know which aspects FB looks at to determine if it's a human.

And even if the bot author had all that information, it would still be super hard to write an AI that accomplishes a given task in a way that mimicks a human successfully. It would mean to win a 'mouse turing test'.

    Shouldn't there be an API for that?
What the API returns is under the control of the user. So the API does not help FB to fingerprint you.

This issue touches on the real privacy problem the net is facing. It's not the wrong cookies or privacy policies. It's fingerprinting. There is no technical solution to it.

> Very very hard. Because the bot author does not have the giant database that FB has to analyze how humans move the mouse around.

Don't forget that Facebook's false positive rate should be very low. There are lots of humans on their platform, and they should all pass the test.

This makes it easier to construct a bot that will pass the test.

It won't hurt if humans fail the test every so often, as long as it's under a threshold that humans regularly can overcome.

I can imagine it would be easy to trick the system a few times (either as a bot pretending to be human, or a human acting like a bot), but tricking it consistently over months or years is going to be damn near impossible.

Also don't forget that Facebook probably has to do all detection in Javascript on the client, i.e. with limited resources. I suspect they don't send every mouse-movement to the server. This also means they probably don't have fine-grained historical data.
Not necessarily.

I've only given it a few minutes thought, but position and time data is really small, and easy to compress (you don't need to send anything while the user isn't moving the mouse). If it's sent in batches or over an already open websocket, it's not like it's using a ton of resources on the client.

Assuming all of their users (guessing a billion daily active users) are on desktop half of the time (a wildly incorrect assumption I'm sure), and the mouse position data is 1mb per person for the data you care about (which again, seems like a lot), that's 500tb.

For $25k you could store it all. That's nothing compared to the benefits of being able to identify bots on your platform.

Yes, the standard way to do this a few years back for conversion optimization, was to RLE compress and send the data in intervals. Also the resolution/measurement does not need to be in the milliseconds.
You could probably start with recording all of your mouse movements over a period of a month or so. Record speed, acceleration, how straight each move is, how much each move deviates from a straight line, how many times the mouse movement stops along its way to its target, where you place the mouse when you scroll, etc.

Using these metrics you could probably start to draw some characteristics of how your mouse acts based on what you are doing and where you are moving your mouse.

This could then probably be used to build some form of algorithm that moves the mouse for you with noise (accelerating up & down along the way, deviation from a straight line, stopping in the middle of the line, etc.).

I was thinking more of a machine-learning approach, e.g. using a GAN network.
Many ways to skin a cat. Not an impossible thing to solve in my book. But i dont deny that it would probably be difficult.
I was thinking the same. Maybe we should do it together...
All those "the attacker won't have the knowledge" underestimate that the attacker can simply run their own websites tracking the exact same stuff, and can then just get the same knowledge.

You need to break ReCaptcha? Simple, you implement your own captcha on your own site that's frequently used and whenever you need to solve one you copy the challenge and present it to one of your users.

Same with recording mouse data.

It's an old idea even, very similar to https://xkcd.com/792/

> Very very hard. Because the bot author does not have the giant database that FB has

You don't need to learn from all humans. You need to learn from very few (or just even one).

Not all problems are machine learning problem.

Not only that, this can be made a machine learning problem if needed. I'm a human, so if I train my computer to act like my mouse movement, it's sufficient to fool facebook. Well, now I realize that this is not as easy as it sounds since as other pointed out we don't know how fine grained Facebook's data is and what they're paying attention to. I'm just saying that theoretically I should be able to train my agent to act just like me.
I don't think it should be that difficult. Project for a weekend hackathon or so. Collect same mouse movement data with a js on your own website if you own anything mildly popular or partner with someone and buy it off, stick it into some off the shelf GAN, job's done. Turing test is broken by modern deep learning

At least the mouse movements themselves shouldn't be difficult to do given a source of data. Simulating that you click on same FB UI elements as real people with same statistical properties on other hand is where you might be lacking the data to do it properly

> stick it into some off the shelf GAN, job's done. Turing test is broken by modern deep learning

This is a cartoon version of deep learning.

Google's already doing it with captcha: https://security.stackexchange.com/questions/78807/how-does-...

But does Facebook track user during everyday session or just during some validation-action?

As for the former, I'm pretty sure I'd trigger as robot since I use keyboard mostly (using some special features in Firefox) just because they make it a lot faster and nicer.

As for the latter, iirc there are blur/focus(-like) events for the window object. Maybe mouse movements gives them better confidence? Because of course you want to make absolutely sure your users are seeing all the ads.

I wouldn't be surprised if they tracked keyboard input as well.
Yeah that's why I think they'd classify me as a robot.
>Stupid cat and mouse game. How difficult would it be for a bot to simulate a human's mouse movements? I suppose not very difficult.

http://idlewords.com/talks/website_obesity.htm An interesting read on this cat and mouse game.

> Stupid cat and mouse game.

I suspect this is part of covering their ass for GDPR. Pose everything as a security problem, so you can claim you have legit interest in tracking all of that.

> How difficult would it be for a bot to simulate a human's mouse movements?

Simulating? Extremely difficult. Perturbing pre-recorded paths is a bit easier to do, but requires pre-recording of a lot of paths. One of the fastest ways to get your Poker bot banned is to not fix this one way or the other.

> Also, doesn't this conflict with rare types of input devices? Or people with a motor function disability?

It still tracks the mouse movements, just now being able to classify their users as disabled or using an arcane device (both of which are interesting tidbits to add to your advertisement profile).

> Shouldn't there be an API for that?

There is one : https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibi...

Although I think it can be a privacy concern and somewhat of an anti-feature. For instance, Youtube uses that API to stop playback on mobile when the page or browser is not in the foreground. Of course, there's an extension for that ...

> How difficult would it be for a bot to simulate a human's mouse movements? I suppose not very difficult.

I also suppose not very difficult in principle (imitate any nearby human's movements and Facebook should not complain), but it is not within the focus of a general bot developer and therefore makes the whole project exponentially more difficult.

I actually think its difficult to simulate mouse movements. Is there even a way to do so using PhantomJS or Headless chrome?
I would imagine that FB is not using these libraries but instead using their engineering teams to develop these "solutions." Perhaps some of our friends are actually "working against" our privacy.