Hacker News new | ask | show | jobs
by stringray 4275 days ago
Here's how I would play this game:

Siphon as much data as possible. Keep it indefinitely, but don't synchronously add rows for shadow identities. Instead, build a querying infrastructure that projects your mountain of data into forms you can exploit at runtime. Part of this is sorting activity into unbound identities.

Now you get to have your cake and eat it too: all of the delicious privacy invasion, with the PR/legal bonus of being able to say you don't have "shadow profiles".

I'm sure Facebook has shadow profiles on everyone, just as I'm sure they're smart enough to prove they don't.

2 comments

Exactly. Facebook doesn't have shadow profiles, they have user data and algorithms. The algorithms traverse the shadow profiles at runtime and produce some output, but there is no need to hold onto the profile data in that form. All facebook needs is the advertising "action-item" that the algorithm produces.

The same analysis applies to the intelligence gathering done by the government. They hold onto all data for all time and draw conclusions from it at a later date.

Whats to stop anyone from doing that now using "open graph"/crowd sourcing and geotagging from seo queries upon names and identities? C&D letters from facebook lawyers they will inevitably send your way from complaints of users asking how you're doing what your doing and to stop, and claim to be protecting users? Ha… I mean seriously with all the public data-sets out there associated with identities, do people really need things like users and login accounts for people to engage in the same behavior as they do on facebook? Are people naive enough to believe that they need to have such to engage in the same social behaviors in a similar fashion?

With 20 VM's located from unknown places in the world, you can mine everyone from FB and query data that facebook declares "public" in about two months… for about $200 total… wanna build a open sourced facial recog database with profile photos (graph.facebook.com/{your_user_id}/picture?type=large)[and then use crowd-soursing to make such better for images that don't get put into the model from not passing simple feature detection from open source recog libraries out there]. Want to query an ip address and return a probability distribution for names associated with pages visited with such + browser fingerprinting techniques written about in enough detail to bore anyone? Prob can't build a business on top of it from it within the US (maybe), but that's what legal arbitrage is for.

Things existed before facebook, things will exist after…