| Hi, I'm the product manager for Cloudflare Analytics. Thanks for this thorough and thoughtful review. We are totally serious about building a world-class, privacy-first, free analytics product. At risk of HN cliche, this is our "early work". We are actively working to fix many of the rough edges mentioned here; if we had waited to fix all of them before shipping, we never would have shipped! For folks who haven't seen it, I suggest checking out our launch blog post[0] which gives some more context around edge vs browser analytics (spoiler: we do both!), why we count visits the way we do, and how we handle bot traffic. We know we have work to do on the "jagged lines" problem. For some low-traffic websites, we might show noisier, low-resolution data than is ideal. (We've artificially constrained our analytics to query a maximum of 7 days at a time because this problem is exacerbated with longer time ranges.) My colleague Jamie wrote a nice blog post about how and why we sample data [1]. In short: we have an existing customer base of 25 million+ Internet priorities, whose traffic volume spans 9 orders of magnitude! Sampling data is an elegant approach that allows us to serve fast, flexible analytics for all our customers. Sampling shouldn't be feared, but we know we can do better in some cases. We've recently merged some deep-in-the-weeds improvements to ClickHouse [2] that should result in improved resolution. And we're currently working to store full-resolution data for the smallest websites. Happy to address any other specific points that folks have questions about. [0] https://blog.cloudflare.com/free-privacy-first-analytics-for...
[1] https://blog.cloudflare.com/explaining-cloudflares-abr-analy...
[2] https://github.com/ClickHouse/ClickHouse/pull/14221 |
Well I would say the opposite, sampling should absolutely be feared. In a lot of case sampling is not an issue, home page, or popular page but in others, including checkout pages, , product pages, and low visibility pages, sampling can make massive difference. When working with sampled data you should always keep it in mind