|
|
|
|
|
by dxbydt
3394 days ago
|
|
>Except when Twitter makes that a heuristic for detecting bots There are several dozen. Maybe not turned on in prod, but that's a whole another story.
In fact, within three months of joining, way back in 2012, one of my very first tasks was to write a standard datamining job that would compute the difference between the GPS location during office hours (9-5pm) and the GPS location during home hours (7pm-7am) of everyone who tweets. A histogram of those differences would tell you about the commute distance of the average American who tweets. You could then bucket by region and say interesting things like the average NY tweeter commutes 25 miles more than the average CA tweeter.
Looking at the results we got, it was clear there was a substantial percent of bots, because their location varied so widely, minute to minute hour to hour. Haversine of GPS diffs will be reasonably stable, because your IP maps to the GPS ( we used the standard Maxmind geoip2 API) , and those IPs are relatively stable....Except if you are a bot and switching IPs willy-nilly.
This was just one instance, but there were several such projects...usually interns and new employees would work on these to get their feet wet, and then move on to more substantial projects. |
|