|
|
|
|
|
by drostie
5224 days ago
|
|
Maybe, but there would have to also be passes through the data afterward to link identities. Hashing is dangerous since browsers are living, dynamic beasts. When someone updates their browser, their useragent changes, and you'll want to keep their new identity as an extension of their old one. Not to mention that people use multiple browsers. So there's going to be a vital step of "linking the new identity to old ones" which can happen on a different thread more dedicated -- but you'll need to keep data. You'll probably truncate ultralarge fields and then GZIP them or so, rather than just hashing them. One interesting thought: how much space would you need to pull this off? Chromium generates 12 KB of data which can gzip to 3KB, Firefox generates 5 KB of data which can gzip to a little over 1KB. Truncate-then-gzip could be used to keep perhaps 0 - 4 KB per person. Assume that your average user uses ~2KB. That's still rather a lot, when compared with what you can do with counters -- 8 bytes or so to store. If you wanted to keep your database under 2 TB, you could only handle a million people, not hundreds of millions. So it would really be a big distributed project to link identities as they evolve over time. I imagine that's one huge factor in using tracking cookies; it's lazy for scaling. |
|
An interesting project might be to create a database having a table with the useragent hash as the primary key, and associate each identity in the user table to a number of these useragent hashes.