|
|
|
|
|
by XCSme
1867 days ago
|
|
I am creating a "privacy-focused" analytics tool[0], that actually provides useful stats. The privacy part, compared to other tools, comes from the fact that it's self-hosted, so no data is shared with 3rd parties, which is the best way to achieve data privacy.
You can detect returning visitors in various way, an option in userTrack is to store the hash of IP + user-agent string of the visitor. It is not 100% accurate and if the visitor updates his browser or his IP changes it will be considered to be a new user.
If the user is logged-in, you can tag each session with his username or user ID. Also keep in mind that fully persistent identities rarely exist (unless the user is logged-in), as the cookies can be cleared at any point or simply be blocked/reset by the browser on each visit. PS: I do agree that many privacy-focused tools are also not really private, because they still are a 3rd-party aggregating data across the web. [0]: https://www.usertrack.net/ |
|
Nowadays, privacy is a pretty convoluted word. I like to consider it from the point of view of the most impacted actor, the end-user. And from his perspective, you remain a third party as long as his data is concerned. The sole fact that the tool is self-hosted cannot be a guarantee of privacy. Though, it's more likely to achieve stronger privacy if the number of third parties is small.
Therefore, with your tool:
- Either you have an identity (i.e., hash(x,y,z)) that is persisted over time (notwithstanding its accuracy).
- Or you have an identity that is forgotten after a certain period of time (e.g., 24h).
In the first case, it cannot be considered a privacy-focused tool, and in the second case, it has the same shortcomings I've described in the original question.
---
It is crucial to note that the question is about the quality of users' metrics in privacy-focused tools.
There ain't no such thing as a free lunch. End-user's privacy comes at the expense of actionable metrics. Furthermore, at best, people using these tools are not aware of the shortcomings and the risk of misleading numbers. At worse, these very concerns are kept away in the marketing speeches of these tools to minimize their real impact.
Above is an opinion, and I would like to debate about it. About my possible misunderstanding of these tools. About possible solutions.